-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've spent the past few days very carefully examining `bloat'. In particular, I've been poring over the generated code for similar functions written using raw COM interface pointers |AddRef|ed and |Release|d by hand, |nsCOMPtr|s, and the smart-pointers generated by the macro in "nsIPtr.h". I was doing this because it was assigned to me as a task: "People are afraid that |nsCOMPtr| bloats; they don't want to use it. Investigate." Summary: I have learned many things. I've learned that on Windows, there is no clear winner between raw pointers, |nsCOMPtr|s, and `nsIPtr's, with respect to generated code size. This is good news, though, because it means |nsCOMPtr| in general isn't a predictor for code bloat. On the Macintosh, |nsCOMPtr| was usually better (i.e., generated smaller code) than raw pointers, and never worse. So, in the end, I believe the tests validate that using |nsCOMPtr|s does not pose a threat in terms of code bloat. However, the results of the tests provide many interesting surprises and reveal some new techniques for optimization. OK. Now you've read the important part. You can stop now, if you're already convinced or don't care about the details. Go forth and use |nsCOMPtr| whenever you need an owning pointer. Details: Function size results were determined by examining the disassembly of the generated code. All reasonable optimizations were enabled (including /01 and /0t on windows) and exceptions were disabled, just as we build for release. It turns out that you can't just look at size of the generated object file. On Macintosh (and sometimes on Windows), both |nsCOMPtr| and `nsIPtr' generate weird out-of-line copies of certain member functions which are neither referenced nor used, and which end up being stripped by the linker. This anomaly causes object files using either smart-pointer implementation to look (inaccurately) bloated, particularly on Mac. On Windows, `nsIPtr' generates an out-of-line destructor per class, per file, that _is_ used, at a cost of 196 bytes each. On all platforms where the |NS_DEFINE_IID| macro was used, a cost of 16 bytes per use, per file in data space is levied. The |GetIID()| function charges only 16 bytes per class (i.e., not per file) for the entire application, and so is preferred. Generated code on the PowerPC is typically significantly larger than similar code on the Intel processors because of its RISC architecture. It is larger by a factor of 2 to 3. The Metrowerks compilers are very predictable. If a given code-pattern saves space in one function, it saves space everywhere. VC++, however, is not predictable. One cannot predict whether a given code-pattern will have the same effect on size in two different functions. Generated code size under VC++ correlated poorly with optimizations in the code, and with the results of the Metrowerks compilers. For these tests, I wrote 6 files each of which contained multiple implementations of, essentially, the same function. These files are checked in to mozilla/xpcom/tests/ as SizeTest01.cpp, SizeTest02.cpp, ..., SizeTest06.cpp. I encourage you to examine them to satisfy yourself the tests weren't `fixed'. Compile them yourself, and investigate the generated code for insight into the compilers mind. The function is implemented at least once using each scheme (raw pointers, |nsCOMPtr|s, and `nsIPtr's) and perhaps multiple times where significant optimizations apply. All code in the functions that would be required in production but wasn't directly related to manipulating the pointers was commented out to magnify the differences in generated code size. One interesting thing I learned is an effective technique for optimizing the space consumed by both |nsCOMPtr|s and `nsIPtr's. For both, construction is cheaper than assignment. This is because: // this code... // is equivalent to this nsCOMPtr<IFoo> fooP; IFoo* fooP = 0; // ... // ... GetAFoo( getter_AddRefs(fooP) ); if ( fooP ) { fooP->Release(); fooP = 0; } GetAFoo( &fooP ); It's difficult, if not impossible to optimize all that away in the compiler. Here is an alterative code pattern that offers significant savings when applied to both |nsCOMPtr|s and `nsIPtr's: IFoo* temp; GetAFoo(&temp); nsCOMPtr<IFoo> fooP = dont_AddRef(temp); Additionally, the `nested if' pattern that optimizes raw pointer use is no (space) savings when applied to |nsCOMPtr|s. Stick to the easy linear scheme unless profiling shows speed is an issue in a given function. Really Detailed Details: In the following results summaries, I list only the best performer in each scheme for each test. These are _very_ brief summaries compared to the full results, which can be found in a comment at the top of each file. On Windows, |nsCOMPtr| tests with an `*' have in-line destructors, those without have an out-of-line destructor in the factored base class. I also provide an example, usually the best performer, for each test. SizeTest01.cpp: Assign into, |AddRef|, call through, and |Release| a pointer. Windows: raw_optimized 31 bytes nsCOMPtr* 34 nsIPtr_optimized 34 + 196 Macintosh: raw_optimized, nsCOMPtr_optimized 112 bytes nsIPtr_optimized 124 void Test01_raw_optimized( nsIDOMNode* aDOMNode, nsString* aResult ) { // if ( !aDOMNode ) // return; nsIDOMNode* node = aDOMNode; NS_ADDREF(node); node->GetNodeName(*aResult); NS_RELEASE(node); } SizeTest02.cpp: |QueryInterface| into a pointer, call through, and |Release|. Windows: Raw01 52 bytes nsCOMPtr 63 nsIPtr 66 + 196 Macintosh: nsCOMPtr 120 bytes Raw01 128 nsIPtr 196 void // nsresult Test02_nsCOMPtr( nsISupports* aDOMNode, nsString* aResult ) { nsresult status; nsCOMPtr<nsIDOMNode> node=do_QueryInterface(aDOMNode, &status); if ( node ) node->GetNodeName(*aResult); // return status; } SizeTest03.cpp: Call a `getter' function to fill in a pointer, call through the pointer and |Release|. Windows: nsCOMPtr_optimized* 45 bytes raw_optimized 48 nsIPtr_optimized 45 + 196 Macintosh: nsCOMPtr_optimized 112 bytes raw_optimized 124 nsIPtr 192 void // nsresult Test03_nsCOMPtr_optimized( nsIDOMNode* aDOMNode, nsString* aResult ) { // if ( !aDOMNode || !aResult ) // return NS_ERROR_NULL_POINTER; nsIDOMNode* temp; nsresult status = aDOMNode->GetParentNode(&temp); nsCOMPtr<nsIDOMNode> parent( dont_AddRef(temp) ); if ( parent ) parent->GetNodeName(*aResult); // return status; } SizeTest04.cpp: A typical `setter'. Windows: nsCOMPtr 13 bytes raw 36 nsIPtr 43 Macintosh: nsCOMPtr 36 bytes raw 120 nsIPtr 128 void // nsresult Test04_nsCOMPtr::SetNode( nsIDOMNode* newNode ) { mNode = newNode; // mNode is an |nsCOMPtr<nsIDOMNode>| // return NS_OK; } SizeTest05.cpp: A typical `getter'. Windows: raw, nsCOMPtr, nsIPtr 21 bytes Macintosh: Raw, nsCOMPtr, nsIPtr 64 bytes void // nsresult Test04_nsCOMPtr::GetNode( nsIDOMNode** aNode ) { // if ( ! aNode ) // return NS_ERROR_NULL_POINTER; *aNode = mNode; NS_IF_ADDREF(*aNode); // return NS_OK; } SizeTest06.cpp: A COM heavy function, pulled from our code base, involving four COM interface pointers, two calls to |QueryInterface| and two `getter's. Windows: nsCOMPtr_optimized 176 raw_optimized 191 nsIPtr_optimized 137 + 196 Macintosh: nsCOMPtr_optimized 300 raw_optimized 332 nsIPtr_optimized 400 void // nsresult Test06_nsCOMPtr_optimized( nsIDOMWindow* aDOMWindow, nsCOMPtr<nsIWebShellWindow>* aWebShellWindow ) { // if ( !aDOMWindow ) // return NS_ERROR_NULL_POINTER; nsresult status; nsCOMPtr<nsIScriptGlobalObject> scriptGlobalObject = do_QueryInterface(aDOMWindow, &status); nsIWebShell* temp0; if ( scriptGlobalObject ) scriptGlobalObject->GetWebShell(&temp0); nsCOMPtr<nsIWebShell> webShell = dont_AddRef(temp0); if ( webShell ) status = webShell->GetRootWebShellEvenIfChrome(temp0); nsCOMPtr<nsIWebShell> rootWebShell = dont_AddRef(temp0); nsIWebShellContainer* temp1; if ( rootWebShell ) status = rootWebShell->GetContainer(temp1); nsCOMPtr<nsIWebShellContainer> webShellContainer = dont_AddRef(temp1); (*aWebShellWindow) = do_QueryInterface(webShellContainer, &status); // return status; } I encourage you to examine the tests yourself. Compile them. Look at the generated code. Convince yourself. Write new tests. Again, the summaries above are very brief slices of the full results, which can be found in the test files. Hope this helps, ______________________________________________________________________ Scott Collins <http://www.meer.net/ScottCollins?Netscape> -----BEGIN PGP SIGNATURE----- Version: PGP Personal Privacy 6.0.2 Comment: get my key at <http://www.meer.net/ScottCollins/#key> iQA/AwUBNzdJv/GmojMuVn+fEQKTuwCg/DwKV+pmpsf10H6vEA3ObFMau38An09o tkwKrQtUbn5wBposUfiwLlAq =I8Yt -----END PGP SIGNATURE-----
There are some small errors in a couple of the test functions that need to be repaired, and I still haven't counted the bytes on Linux. So there is work yet left to do.