Code Bloat [LONG, summary at top]

last modified 12 May 1999
Abstract: An email/newsgroup posting that summarizes the results of some code generation size tests comparing raw pointers, nsCOMPtrs, and `nsIPtr's.
The Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've spent the past few days very carefully examining `bloat'.  In
particular, I've been poring over the generated code for similar
functions written using raw COM interface pointers |AddRef|ed and
|Release|d by hand, |nsCOMPtr|s, and the smart-pointers generated by
the macro in "nsIPtr.h".  I was doing this because it was assigned to
me as a task: "People are afraid that |nsCOMPtr| bloats; they don't
want to use it.  Investigate."


Summary:

I have learned many things.  I've learned that on Windows, there is no
clear winner between raw pointers, |nsCOMPtr|s, and `nsIPtr's, with
respect to generated code size.  This is good news, though, because it
means |nsCOMPtr| in general isn't a predictor for code bloat.  On the
Macintosh, |nsCOMPtr| was usually better (i.e., generated smaller
code) than raw pointers, and never worse.  So, in the end, I believe
the tests validate that using |nsCOMPtr|s does not pose a threat in
terms of code bloat.  However, the results of the tests provide many
interesting surprises and reveal some new techniques for optimization.


OK.  Now you've read the important part.  You can stop now, if you're
already convinced or don't care about the details.  Go forth and use
|nsCOMPtr| whenever you need an owning pointer.


Details:

Function size results were determined by examining the disassembly of
the generated code.  All reasonable optimizations were enabled
(including /01 and /0t on windows) and exceptions were disabled, just
as we build for release.

It turns out that you can't just look at size of the generated object
file.  On Macintosh (and sometimes on Windows), both |nsCOMPtr| and
`nsIPtr' generate weird out-of-line copies of certain member functions
which are neither referenced nor used, and which end up being stripped
by the linker.  This anomaly causes object files using either
smart-pointer implementation to look (inaccurately) bloated,
particularly on Mac.  On Windows, `nsIPtr' generates an out-of-line
destructor per class, per file, that _is_ used, at a cost of 196 bytes
each.  On all platforms where the |NS_DEFINE_IID| macro was used, a
cost of 16 bytes per use, per file in data space is levied.  The
|GetIID()| function charges only 16 bytes per class (i.e., not per
file) for the entire application, and so is preferred.

Generated code on the PowerPC is typically significantly larger than
similar code on the Intel processors because of its RISC architecture.
 It is larger by a factor of 2 to 3.

The Metrowerks compilers are very predictable.  If a given
code-pattern saves space in one function, it saves space everywhere.
VC++, however, is not predictable.  One cannot predict whether a given
code-pattern will have the same effect on size in two different
functions.  Generated code size under VC++ correlated poorly with
optimizations in the code, and with the results of the Metrowerks
compilers.

For these tests, I wrote 6 files each of which contained multiple
implementations of, essentially, the same function.  These files are
checked in to mozilla/xpcom/tests/ as SizeTest01.cpp, SizeTest02.cpp,
..., SizeTest06.cpp.  I encourage you to examine them to satisfy
yourself the tests weren't `fixed'.  Compile them yourself, and
investigate the generated code for insight into the compilers mind.
The function is implemented at least once using each scheme (raw
pointers, |nsCOMPtr|s, and `nsIPtr's) and perhaps multiple times where
significant optimizations apply.  All code in the functions that would
be required in production but wasn't directly related to manipulating
the pointers was commented out to magnify the differences in generated
code size.

One interesting thing I learned is an effective technique for
optimizing the space consumed by both |nsCOMPtr|s and `nsIPtr's.  For
both, construction is cheaper than assignment.  This is because:

   // this code...                      // is equivalent to this
  nsCOMPtr<IFoo> fooP;                IFoo* fooP = 0;

   // ...                               // ...

  GetAFoo( getter_AddRefs(fooP) );    if ( fooP )
                                        {
                                          fooP->Release();
                                          fooP = 0;
                                        }
                                      GetAFoo( &fooP );

It's difficult, if not impossible to optimize all that away in the
compiler.  Here is an alterative code pattern that offers significant
savings when applied to both |nsCOMPtr|s and `nsIPtr's:

  IFoo* temp;
  GetAFoo(&temp);
  nsCOMPtr<IFoo> fooP = dont_AddRef(temp);

Additionally, the `nested if' pattern that optimizes raw pointer use
is no (space) savings when applied to |nsCOMPtr|s.  Stick to the easy
linear scheme unless profiling shows speed is an issue in a given
function.


Really Detailed Details:

In the following results summaries, I list only the best performer in
each scheme for each test.  These are _very_ brief summaries compared
to the full results, which can be found in a comment at the top of
each file.  On Windows, |nsCOMPtr| tests with an `*' have in-line
destructors, those without have an out-of-line destructor in the
factored base class.  I also provide an example, usually the best
performer, for each test.

SizeTest01.cpp:

  Assign into, |AddRef|, call through, and |Release| a pointer.

  Windows:
    raw_optimized     31 bytes
    nsCOMPtr*         34
    nsIPtr_optimized  34 + 196

  Macintosh:
    raw_optimized, nsCOMPtr_optimized    112 bytes
    nsIPtr_optimized		         124

  void
  Test01_raw_optimized( nsIDOMNode* aDOMNode, nsString* aResult )
    {
  //  if ( !aDOMNode )
  //    return;

      nsIDOMNode* node = aDOMNode;
      NS_ADDREF(node);
      node->GetNodeName(*aResult);
      NS_RELEASE(node);
    }


SizeTest02.cpp:

  |QueryInterface| into a pointer, call through, and |Release|.

  Windows:
    Raw01      52 bytes
    nsCOMPtr   63
    nsIPtr     66 + 196

  Macintosh:
    nsCOMPtr  120 bytes
    Raw01     128
    nsIPtr    196

  void // nsresult
  Test02_nsCOMPtr( nsISupports* aDOMNode, nsString* aResult )
    {
      nsresult status;
      nsCOMPtr<nsIDOMNode> node=do_QueryInterface(aDOMNode, &status);
      if ( node )
        node->GetNodeName(*aResult);
  //  return status;
    }


SizeTest03.cpp:

  Call a `getter' function to fill in a pointer, call through the
pointer and |Release|.

  Windows:
    nsCOMPtr_optimized*  45 bytes
    raw_optimized        48
    nsIPtr_optimized     45 + 196

  Macintosh:
    nsCOMPtr_optimized  112 bytes
    raw_optimized       124
    nsIPtr              192


  void // nsresult
  Test03_nsCOMPtr_optimized( nsIDOMNode* aDOMNode, nsString* aResult )
    {
  //  if ( !aDOMNode || !aResult )
  //    return NS_ERROR_NULL_POINTER;
      nsIDOMNode* temp;
      nsresult status = aDOMNode->GetParentNode(&temp);
      nsCOMPtr<nsIDOMNode> parent( dont_AddRef(temp) );
      if ( parent )
        parent->GetNodeName(*aResult);
  //  return status;
    }


SizeTest04.cpp:

   A typical `setter'.

  Windows:
    nsCOMPtr    13 bytes
    raw         36
    nsIPtr      43

  Macintosh:
    nsCOMPtr    36 bytes
    raw        120
    nsIPtr     128

  void // nsresult
  Test04_nsCOMPtr::SetNode( nsIDOMNode* newNode )
    {
      mNode = newNode; // mNode is an |nsCOMPtr<nsIDOMNode>|
  //  return NS_OK;
    }


SizeTest05.cpp:

  A typical `getter'.

  Windows:
    raw, nsCOMPtr, nsIPtr    21 bytes

  Macintosh:
    Raw, nsCOMPtr, nsIPtr    64 bytes

  void // nsresult
  Test04_nsCOMPtr::GetNode( nsIDOMNode** aNode )
    {
  //  if ( ! aNode )
  //    return NS_ERROR_NULL_POINTER;

      *aNode = mNode;
      NS_IF_ADDREF(*aNode);

  //  return NS_OK;
    }


SizeTest06.cpp:

   A COM heavy function, pulled from our code base, involving four COM
interface pointers, two calls to |QueryInterface| and two `getter's.

  Windows:
    nsCOMPtr_optimized  176
    raw_optimized       191
    nsIPtr_optimized    137 + 196

  Macintosh:
    nsCOMPtr_optimized  300
    raw_optimized       332
    nsIPtr_optimized    400

  void // nsresult
  Test06_nsCOMPtr_optimized( nsIDOMWindow* aDOMWindow,
                        nsCOMPtr<nsIWebShellWindow>* aWebShellWindow )
    {
  //  if ( !aDOMWindow )
  //    return NS_ERROR_NULL_POINTER;

      nsresult status;
      nsCOMPtr<nsIScriptGlobalObject> scriptGlobalObject =
                               do_QueryInterface(aDOMWindow, &status);

      nsIWebShell* temp0;
      if ( scriptGlobalObject )
        scriptGlobalObject->GetWebShell(&temp0);
      nsCOMPtr<nsIWebShell> webShell = dont_AddRef(temp0);

      if ( webShell )
        status = webShell->GetRootWebShellEvenIfChrome(temp0);
      nsCOMPtr<nsIWebShell> rootWebShell = dont_AddRef(temp0);

      nsIWebShellContainer* temp1;
      if ( rootWebShell )
        status = rootWebShell->GetContainer(temp1);
      nsCOMPtr<nsIWebShellContainer> webShellContainer =
                                                   dont_AddRef(temp1);
      (*aWebShellWindow) =
                        do_QueryInterface(webShellContainer, &status);

  //  return status;
    }



I encourage you to examine the tests yourself.  Compile them.  Look at
the generated code.  Convince yourself.  Write new tests.  Again, the
summaries above are very brief slices of the full results, which can
be found in the test files.

Hope this helps,
______________________________________________________________________
Scott Collins              <http://www.meer.net/ScottCollins?Netscape>





-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.0.2
Comment: get my key at <http://www.meer.net/ScottCollins/#key>

iQA/AwUBNzdJv/GmojMuVn+fEQKTuwCg/DwKV+pmpsf10H6vEA3ObFMau38An09o
tkwKrQtUbn5wBposUfiwLlAq
=I8Yt
-----END PGP SIGNATURE-----
Epilogue

There are some small errors in a couple of the test functions that need to be repaired, and I still haven't counted the bytes on Linux. So there is work yet left to do.