You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



refcount balancer

Contact: Chris Waterson (waterson@netscape.com)

overview

One of the things that sucks about XPCOM is that you have to deal with reference counting. It's hard and prone to errors, and Mozilla leaks like a sieve because of it. Unlike the good old fashioned malloc() and free() model where memory gets allocated in exactly one place and freed in exactly one other place, reference counting is distributed all over. There may be twenty different spots in the code where a single object is AddRef()-ed. And if just one of those AddRef()-ers forgets to Release(), well, you're screwed.

Traditional leak tracking tools like Purify don't help much either. They'll tell you that you leaked an object, but they won't help you track down the twenty different clients that AddRef()-ed it, let alone the joker that forgot to Release() it.

This crude set of tools attempts to address that problem. It's not a panacea, but it at least gives some insight into who is AddRef()-ing whom.

From 50,000 feet, here's what happens.

  • You discover that your FooImpl object is leaking, maybe Bruce Mitchener tells you, maybe you notice on your own because your destructor is never called. You cringe and moan and later the bug for 3 or 4 milestones. But since you know about this tool, you eventually roll up your sleeves and start working on it.
  • You set a couple of environment variables in a debug build.
  • As you run, you notice piles and piles of information will start to spew out to the console. Specifically, as your object is AddRef()-ed and Release()-ed, a stack trace is generated, along with the operation (AddRef or Release), this (i.e., the object that just got operated on), and the current reference count of your object. This mountain of information, although impressive, is useless in its current form.
  • You next run Perl script #1 over the resulting log file. This Perl script will pick out the instances of objects that leaked. You choose one of the objects that's particularly interesting to you.
  • You now run Perl script #2 over the log file. This script is the Fancy Magic. It takes each stack trace and strings it together into a call graph. Each node in the graph represents a call site, and has a "balance factor" which is the total number of AddRef() operations that it has been included in minus the total number of Release() operations that it has been included in. (I told you it was Fancy Magic.)

So what does all that mean? The cool part -- you were waiting for the cool part -- is that you can look at this graph and see what subtrees are "balanced"; i.e., total number of AddRef()s equals total number of Release()-es. You know you don't need to worry about those trees because no evil leakage happened there.

For trees that are out of balance, you need to dig a little bit deeper. Subtrees get out of balance when one code path AddRef()s the object, and a code path somewhere else does the corresponding Release().

Like I said, it's not a panacea, but you can start to play Mah Jongg with the out-of-balance trees, proving to yourself in each case that the AddRef() from one tree matches with the Release() in another. In short, it does a decent job of directing you to the places you need to verify in your code.

Details

Enabling Runtime Logging. You need to set a couple of runtime environment variable to produce output.

for Unix

setenv XPCOM_MEM_REFCNT_LOG log-file.dat
setenv XPCOM_MEM_LOG_CLASSES MyLeakyObjectImpl
setenv XPCOM_MEM_LOG_OBJECTS MyLeakyObjectSerialNumber (optional)

for Windows

set XPCOM_MEM_REFCNT_LOG=log-file.dat
set XPCOM_MEM_LOG_CLASSES=MyLeakyObjectImpl
set XPCOM_MEM_LOG_OBJECTS=MyLeakyObjectSerialNumber (optional)

for Mac

Create a file called environment in the directory where the MozillaDebug binary sits containing the lines

XPCOM_MEM_REFCNT_LOG=log-file.dat
XPCOM_MEM_LOG_CLASSES=MyLeakyObjectImpl
XPCOM_MEM_LOG_OBJECTS=MyLeakyObjectSerialNumber (optional)

(Note that case is important.) These variables are described in more detail in the Memory Tools documentation.

Now when you run, you should see lots of information dumped to your log-file.dat (which defaults to the console, if not set). Specifically, each time an object is AddRef()-ed and Release()-ed, several lines will get added to the file. So make sure you have plenty of disk space.

Postprocessing Step 1: Finding the Leakers. First you have to figure out which objects leaked. There's a Perl script that does this. It grovels through the log file, and figures out which objects got allocated (it knows because they were just allocated because they got AddRef()-ed and their refcount became 1). It adds them to a list. When it finds an object that got freed (it knows because its refcount goes to 0), it removes it from the list. Anything left over is leaked.

The script is called find-leakers.pl. So, depending on your platform, do something like:

% perl -w find-leakers.pl my-leaks.log

(Replace my-leaks.log with your logfile.) This will print out a list of pointers:

0x00253ab0 (1)
0x00253ae0 (2)
0x00253bd0 (4)

The number in parenthesis is the order in which it was allocated, if you care. Pick one for use with Step 2.

Postprocessing Step 2: Building the Balance Tree. Now that you've picked an object that leaked, you can build a "balance tree" (anyone who can think of a better name feel free to let me know). This process takes all the stack AddRef() and Release() stack traces and munges them into a call graph. Each node in the graph represents a call site. Each call site has a "balance factor", which is positive if more AddRef()s than Release()-es have happened at the site, zero if the number of AddRef()s and Release()-es are equal, and negative if more Release()-es than AddRef()s have happened at the site.

To build the balance tree, run make-tree.pl; e.g.,

% perl -w make-tree.pl --object 0x00253ab0 < my-leak.log

Note that you specify the object that you want make-tree.pl to examine. This will build an indented tree that looks something like this (except probably a lot larger and leafier):

.root: bal=1
  main: bal=1
    DoSomethingWithFooAndReturnItToo: bal=2
      NS_NewFoo: bal=1

Let's pretend in our toy example that NS_NewFoo() is a factory method that makes a new foo and returns it. DoSomethingWithFooAndReturnItToo() is a method that munges the foo before returning it to main(), the main program.

What this little tree is telling you is that you leak one refcount overall on object 0x00253ab0. But, more specifically, it shows you that:

  • NS_NewFoo() "leaks" a refcount. This is probably "okay" because it's a factory method that creates an AddRef()-ed object.
  • DoSomethingWithFooAndReturnItToo() leaks two refcounts. Hmm...this probably isn't okay, especially because...
  • main() is back down to leaking one refcount.

So from this, we can deduce that main() is correctly releasing the refcount that it got on the object returned from DoSomethingWithFooAndReturnItToo(), so the leak must be somewhere in that function.

So now say we go fix the leak in DoSomethingWithFooAndReturnItToo(), re-run our trace, grovel through the log "by hand" to find the object that corresponds to 0x00253ab0 in the new run, and run make-tree.pl. What we'd hope to see is a tree that looks like:

.root: bal=0
  main: bal=0
    DoSomethingWithFooAndReturnItToo: bal=1
      NS_NewFoo: bal=1

That is, NS_NewFoo() "leaks" a single reference count; this leak is "inherited" by DoSomethingWithFooAndReturnItToo(); but is finally balanced by a Release() in main().

Hints

Clearly, this is an iterative and analytical process. Maybe somebody smarter than me can figure out ways to automate parts of it. To date, I've figured out some tricks.

Ignoring balanced trees. The make-tree.pl script accepts an option --ignore-balanced, which tells it not to bother printing out the children of a node whose balance factor is zero. This can help remove some of the clutter from an otherwise noisy tree.

Playing Mah Jongg. An unbalanced tree is not necessarily an evil thing. More likely, it indicates that one AddRef() is cancelled by another Release() somewhere else in the code. So the game is to try to match them with one another.

Excluding Functions To aid in this process, you can create an "excludes file", that lists the name of functions that you want to exclude from the tree building process (presumably because you've matched them). make-tree.pl accepts the option --exlude [file], where [file] is a newline-separated list of function names that will be excluded from consideration while building the tree. Specifically, any call stack that contains that call site will not contribute to the computation of balance factors in the tree.

pricing & availability

As of this writing, the stack tracing code is implemented for Win32 and i386 Linux (compiled with egcs and glibc 2.0 and 2.1). Dontations gladly accepted; Bourbon preferred over other currencies.

The Perl scripts, available by checking out tools/rb, require only Larry Wall's finest (5.00504 seems to work for me).

credits

I stole the stack walking code from Kipp Hickman and Matt Pietrek (see this article). For Linux, Mike Shaver, Bruce Mitchener, and Ramiro Estrugo. all helped me get things right. Mucho gusto. Waldemar Horwat and Jim Roskind helped to improve the post-processing scripts.

further reading