-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I've spent the past few days very carefully examining `bloat'. In
particular, I've been poring over the generated code for similar
functions written using raw COM interface pointers |AddRef|ed and
|Release|d by hand, |nsCOMPtr|s, and the smart-pointers generated by
the macro in "nsIPtr.h". I was doing this because it was assigned to
me as a task: "People are afraid that |nsCOMPtr| bloats; they don't
want to use it. Investigate."
Summary:
I have learned many things. I've learned that on Windows, there is no
clear winner between raw pointers, |nsCOMPtr|s, and `nsIPtr's, with
respect to generated code size. This is good news, though, because it
means |nsCOMPtr| in general isn't a predictor for code bloat. On the
Macintosh, |nsCOMPtr| was usually better (i.e., generated smaller
code) than raw pointers, and never worse. So, in the end, I believe
the tests validate that using |nsCOMPtr|s does not pose a threat in
terms of code bloat. However, the results of the tests provide many
interesting surprises and reveal some new techniques for optimization.
OK. Now you've read the important part. You can stop now, if you're
already convinced or don't care about the details. Go forth and use
|nsCOMPtr| whenever you need an owning pointer.
Details:
Function size results were determined by examining the disassembly of
the generated code. All reasonable optimizations were enabled
(including /01 and /0t on windows) and exceptions were disabled, just
as we build for release.
It turns out that you can't just look at size of the generated object
file. On Macintosh (and sometimes on Windows), both |nsCOMPtr| and
`nsIPtr' generate weird out-of-line copies of certain member functions
which are neither referenced nor used, and which end up being stripped
by the linker. This anomaly causes object files using either
smart-pointer implementation to look (inaccurately) bloated,
particularly on Mac. On Windows, `nsIPtr' generates an out-of-line
destructor per class, per file, that _is_ used, at a cost of 196 bytes
each. On all platforms where the |NS_DEFINE_IID| macro was used, a
cost of 16 bytes per use, per file in data space is levied. The
|GetIID()| function charges only 16 bytes per class (i.e., not per
file) for the entire application, and so is preferred.
Generated code on the PowerPC is typically significantly larger than
similar code on the Intel processors because of its RISC architecture.
It is larger by a factor of 2 to 3.
The Metrowerks compilers are very predictable. If a given
code-pattern saves space in one function, it saves space everywhere.
VC++, however, is not predictable. One cannot predict whether a given
code-pattern will have the same effect on size in two different
functions. Generated code size under VC++ correlated poorly with
optimizations in the code, and with the results of the Metrowerks
compilers.
For these tests, I wrote 6 files each of which contained multiple
implementations of, essentially, the same function. These files are
checked in to mozilla/xpcom/tests/ as SizeTest01.cpp, SizeTest02.cpp,
..., SizeTest06.cpp. I encourage you to examine them to satisfy
yourself the tests weren't `fixed'. Compile them yourself, and
investigate the generated code for insight into the compilers mind.
The function is implemented at least once using each scheme (raw
pointers, |nsCOMPtr|s, and `nsIPtr's) and perhaps multiple times where
significant optimizations apply. All code in the functions that would
be required in production but wasn't directly related to manipulating
the pointers was commented out to magnify the differences in generated
code size.
One interesting thing I learned is an effective technique for
optimizing the space consumed by both |nsCOMPtr|s and `nsIPtr's. For
both, construction is cheaper than assignment. This is because:
// this code... // is equivalent to this
nsCOMPtr<IFoo> fooP; IFoo* fooP = 0;
// ... // ...
GetAFoo( getter_AddRefs(fooP) ); if ( fooP )
{
fooP->Release();
fooP = 0;
}
GetAFoo( &fooP );
It's difficult, if not impossible to optimize all that away in the
compiler. Here is an alterative code pattern that offers significant
savings when applied to both |nsCOMPtr|s and `nsIPtr's:
IFoo* temp;
GetAFoo(&temp);
nsCOMPtr<IFoo> fooP = dont_AddRef(temp);
Additionally, the `nested if' pattern that optimizes raw pointer use
is no (space) savings when applied to |nsCOMPtr|s. Stick to the easy
linear scheme unless profiling shows speed is an issue in a given
function.
Really Detailed Details:
In the following results summaries, I list only the best performer in
each scheme for each test. These are _very_ brief summaries compared
to the full results, which can be found in a comment at the top of
each file. On Windows, |nsCOMPtr| tests with an `*' have in-line
destructors, those without have an out-of-line destructor in the
factored base class. I also provide an example, usually the best
performer, for each test.
SizeTest01.cpp:
Assign into, |AddRef|, call through, and |Release| a pointer.
Windows:
raw_optimized 31 bytes
nsCOMPtr* 34
nsIPtr_optimized 34 + 196
Macintosh:
raw_optimized, nsCOMPtr_optimized 112 bytes
nsIPtr_optimized 124
void
Test01_raw_optimized( nsIDOMNode* aDOMNode, nsString* aResult )
{
// if ( !aDOMNode )
// return;
nsIDOMNode* node = aDOMNode;
NS_ADDREF(node);
node->GetNodeName(*aResult);
NS_RELEASE(node);
}
SizeTest02.cpp:
|QueryInterface| into a pointer, call through, and |Release|.
Windows:
Raw01 52 bytes
nsCOMPtr 63
nsIPtr 66 + 196
Macintosh:
nsCOMPtr 120 bytes
Raw01 128
nsIPtr 196
void // nsresult
Test02_nsCOMPtr( nsISupports* aDOMNode, nsString* aResult )
{
nsresult status;
nsCOMPtr<nsIDOMNode> node=do_QueryInterface(aDOMNode, &status);
if ( node )
node->GetNodeName(*aResult);
// return status;
}
SizeTest03.cpp:
Call a `getter' function to fill in a pointer, call through the
pointer and |Release|.
Windows:
nsCOMPtr_optimized* 45 bytes
raw_optimized 48
nsIPtr_optimized 45 + 196
Macintosh:
nsCOMPtr_optimized 112 bytes
raw_optimized 124
nsIPtr 192
void // nsresult
Test03_nsCOMPtr_optimized( nsIDOMNode* aDOMNode, nsString* aResult )
{
// if ( !aDOMNode || !aResult )
// return NS_ERROR_NULL_POINTER;
nsIDOMNode* temp;
nsresult status = aDOMNode->GetParentNode(&temp);
nsCOMPtr<nsIDOMNode> parent( dont_AddRef(temp) );
if ( parent )
parent->GetNodeName(*aResult);
// return status;
}
SizeTest04.cpp:
A typical `setter'.
Windows:
nsCOMPtr 13 bytes
raw 36
nsIPtr 43
Macintosh:
nsCOMPtr 36 bytes
raw 120
nsIPtr 128
void // nsresult
Test04_nsCOMPtr::SetNode( nsIDOMNode* newNode )
{
mNode = newNode; // mNode is an |nsCOMPtr<nsIDOMNode>|
// return NS_OK;
}
SizeTest05.cpp:
A typical `getter'.
Windows:
raw, nsCOMPtr, nsIPtr 21 bytes
Macintosh:
Raw, nsCOMPtr, nsIPtr 64 bytes
void // nsresult
Test04_nsCOMPtr::GetNode( nsIDOMNode** aNode )
{
// if ( ! aNode )
// return NS_ERROR_NULL_POINTER;
*aNode = mNode;
NS_IF_ADDREF(*aNode);
// return NS_OK;
}
SizeTest06.cpp:
A COM heavy function, pulled from our code base, involving four COM
interface pointers, two calls to |QueryInterface| and two `getter's.
Windows:
nsCOMPtr_optimized 176
raw_optimized 191
nsIPtr_optimized 137 + 196
Macintosh:
nsCOMPtr_optimized 300
raw_optimized 332
nsIPtr_optimized 400
void // nsresult
Test06_nsCOMPtr_optimized( nsIDOMWindow* aDOMWindow,
nsCOMPtr<nsIWebShellWindow>* aWebShellWindow )
{
// if ( !aDOMWindow )
// return NS_ERROR_NULL_POINTER;
nsresult status;
nsCOMPtr<nsIScriptGlobalObject> scriptGlobalObject =
do_QueryInterface(aDOMWindow, &status);
nsIWebShell* temp0;
if ( scriptGlobalObject )
scriptGlobalObject->GetWebShell(&temp0);
nsCOMPtr<nsIWebShell> webShell = dont_AddRef(temp0);
if ( webShell )
status = webShell->GetRootWebShellEvenIfChrome(temp0);
nsCOMPtr<nsIWebShell> rootWebShell = dont_AddRef(temp0);
nsIWebShellContainer* temp1;
if ( rootWebShell )
status = rootWebShell->GetContainer(temp1);
nsCOMPtr<nsIWebShellContainer> webShellContainer =
dont_AddRef(temp1);
(*aWebShellWindow) =
do_QueryInterface(webShellContainer, &status);
// return status;
}
I encourage you to examine the tests yourself. Compile them. Look at
the generated code. Convince yourself. Write new tests. Again, the
summaries above are very brief slices of the full results, which can
be found in the test files.
Hope this helps,
______________________________________________________________________
Scott Collins <http://www.meer.net/ScottCollins?Netscape>
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.0.2
Comment: get my key at <http://www.meer.net/ScottCollins/#key>
iQA/AwUBNzdJv/GmojMuVn+fEQKTuwCg/DwKV+pmpsf10H6vEA3ObFMau38An09o
tkwKrQtUbn5wBposUfiwLlAq
=I8Yt
-----END PGP SIGNATURE-----
There are some small errors in a couple of the test functions that need to be repaired, and I still haven't counted the bytes on Linux. So there is work yet left to do.