refcount balancer
Contact: Chris Waterson (waterson@netscape.com)overview
One of the things that sucks about XPCOM is that you have to deal with
reference counting. It's hard and prone to errors, and Mozilla leaks like a
sieve because of it. Unlike the good old fashioned malloc()
and
free()
model where memory gets allocated in exactly one place
and freed in exactly one other place, reference counting is distributed all
over. There may be twenty different spots in the code where a single object
is AddRef()
-ed. And if just one of those
AddRef()
-ers forgets to Release()
, well, you're
screwed.
Traditional leak tracking tools like
Purify don't help much either. They'll tell you that you leaked an
object, but they won't help you track down the twenty different clients that
AddRef()
-ed it, let alone the joker that forgot to
Release()
it.
This crude set of tools attempts to address that problem. It's not a
panacea, but it at least gives some insight into who is
AddRef()
-ing whom.
From 50,000 feet, here's what happens.
- You discover that your
FooImpl
object is leaking, maybe Bruce Mitchener tells you, maybe you notice on your own because your destructor is never called. You cringe and moan and later the bug for 3 or 4 milestones. But since you know about this tool, you eventually roll up your sleeves and start working on it. - You set a couple of environment variables in a debug build.
- As you run, you notice piles and piles of information will start to
spew out to the console. Specifically, as your object is
AddRef()
-ed andRelease()
-ed, a stack trace is generated, along with the operation (AddRef or Release),this
(i.e., the object that just got operated on), and the current reference count of your object. This mountain of information, although impressive, is useless in its current form. - You next run Perl script #1 over the resulting log file. This Perl script will pick out the instances of objects that leaked. You choose one of the objects that's particularly interesting to you.
- You now run Perl script #2 over the log file. This script is the Fancy
Magic. It takes each stack trace and strings it together into a call graph.
Each node in the graph represents a call site, and has a "balance factor"
which is the total number of
AddRef()
operations that it has been included in minus the total number ofRelease()
operations that it has been included in. (I told you it was Fancy Magic.)
So what does all that mean? The cool part -- you were waiting for the cool
part -- is that you can look at this graph and see what subtrees are
"balanced"; i.e., total number of AddRef()
s equals total number
of Release()
-es. You know you don't need to worry about
those trees because no evil leakage happened there.
For trees that are out of balance, you need to dig a little bit
deeper. Subtrees get out of balance when one code path AddRef()
s
the object, and a code path somewhere else does the corresponding
Release()
.
Like I said, it's not a panacea, but you can start to play Mah Jongg with
the out-of-balance trees, proving to yourself in each case that the
AddRef()
from one tree matches with the Release()
in another. In short, it does a decent job of directing you to the places you
need to verify in your code.
Details
Enabling Runtime Logging. You need to set a couple of runtime environment variable to produce output.
for Unix
setenv XPCOM_MEM_REFCNT_LOG log-file.dat setenv XPCOM_MEM_LOG_CLASSES MyLeakyObjectImpl setenv XPCOM_MEM_LOG_OBJECTS MyLeakyObjectSerialNumber (optional)
for Windows
set XPCOM_MEM_REFCNT_LOG=log-file.dat set XPCOM_MEM_LOG_CLASSES=MyLeakyObjectImpl set XPCOM_MEM_LOG_OBJECTS=MyLeakyObjectSerialNumber (optional)
for Mac
Create a file called environment
in the directory where the
MozillaDebug
binary sits containing the lines
XPCOM_MEM_REFCNT_LOG=log-file.dat XPCOM_MEM_LOG_CLASSES=MyLeakyObjectImpl XPCOM_MEM_LOG_OBJECTS=MyLeakyObjectSerialNumber (optional)
(Note that case is important.) These variables are described in more detail in the Memory Tools documentation.
Now when you run, you should see lots of information dumped to your
log-file.dat
(which defaults to the console, if not set).
Specifically, each time an object is AddRef()
-ed and
Release()
-ed, several lines will get added to the file. So make
sure you have plenty of disk space.
Postprocessing Step 1: Finding the Leakers. First you
have to figure out which objects leaked. There's a Perl script that does
this. It grovels through the log file, and figures out which objects got
allocated (it knows because they were just allocated because they got
AddRef()
-ed and their refcount became 1). It
adds them to a list. When it finds an object that got freed (it knows because
its refcount goes to 0), it removes it from the list.
Anything left over is leaked.
The script is called find-leakers.pl
.
So, depending on your platform, do something like:
% perl -w find-leakers.pl my-leaks.log
(Replace my-leaks.log
with your logfile.) This will print out
a list of pointers:
0x00253ab0 (1) 0x00253ae0 (2) 0x00253bd0 (4)
The number in parenthesis is the order in which it was allocated, if you care. Pick one for use with Step 2.
Postprocessing Step 2: Building the Balance Tree. Now
that you've picked an object that leaked, you can build a "balance tree"
(anyone who can think of a better name feel free to let me know). This
process takes all the stack AddRef()
and Release()
stack traces and munges them into a call graph. Each node in the graph
represents a call site. Each call site has a "balance factor", which is
positive if more AddRef()
s than Release()
-es have
happened at the site, zero if the number of AddRef()
s and
Release()
-es are equal, and negative if more
Release()
-es than AddRef()
s have happened at the
site.
To build the balance tree, run make-tree.pl
;
e.g.,
% perl -w make-tree.pl --object 0x00253ab0 < my-leak.log
Note that you specify the object that you want make-tree.pl
to examine. This will build an indented tree that looks something like this
(except probably a lot larger and leafier):
.root: bal=1 main: bal=1 DoSomethingWithFooAndReturnItToo: bal=2 NS_NewFoo: bal=1
Let's pretend in our toy example that NS_NewFoo()
is a
factory method that makes a new foo and returns it.
DoSomethingWithFooAndReturnItToo()
is a method that munges the
foo before returning it to main()
, the main program.
What this little tree is telling you is that you leak one
refcount overall on object 0x00253ab0
. But, more
specifically, it shows you that:
NS_NewFoo()
"leaks" a refcount. This is probably "okay" because it's a factory method that creates anAddRef()
-ed object.DoSomethingWithFooAndReturnItToo()
leaks two refcounts. Hmm...this probably isn't okay, especially because...main()
is back down to leaking one refcount.
So from this, we can deduce that main()
is correctly
releasing the refcount that it got on the object returned from
DoSomethingWithFooAndReturnItToo(), so the leak must be somewhere in
that function.
So now say we go fix the leak in
DoSomethingWithFooAndReturnItToo()
, re-run our trace, grovel
through the log "by hand" to find the object that corresponds to
0x00253ab0
in the new run, and run make-tree.pl
.
What we'd hope to see is a tree that looks like:
.root: bal=0 main: bal=0 DoSomethingWithFooAndReturnItToo: bal=1 NS_NewFoo: bal=1
That is, NS_NewFoo()
"leaks" a single reference count; this
leak is "inherited" by DoSomethingWithFooAndReturnItToo()
; but
is finally balanced by a Release()
in main()
.
Hints
Clearly, this is an iterative and analytical process. Maybe somebody smarter than me can figure out ways to automate parts of it. To date, I've figured out some tricks.
Ignoring balanced trees. The make-tree.pl
script accepts an option --ignore-balanced
, which tells it
not to bother printing out the children of a node whose balance
factor is zero. This can help remove some of the clutter from an otherwise
noisy tree.
Playing Mah Jongg. An unbalanced tree is not necessarily
an evil thing. More likely, it indicates that one AddRef()
is
cancelled by another Release()
somewhere else in the code. So
the game is to try to match them with one another.
Excluding Functions To aid in this process, you can
create an "excludes file", that lists the name of functions that you want to
exclude from the tree building process (presumably because you've matched
them). make-tree.pl
accepts the option --exlude
[file]
, where [file]
is a newline-separated list of
function names that will be excluded from consideration while
building the tree. Specifically, any call stack that contains that call site
will not contribute to the computation of balance factors in the tree.
pricing & availability
As of this writing, the stack tracing code is implemented for Win32 and
i386 Linux (compiled with egcs
and glibc 2.0 and 2.1).
Dontations gladly accepted; Bourbon preferred over other currencies.
The Perl scripts, available by checking out tools/rb, require only Larry Wall's finest (5.00504 seems to work for me).
credits
I stole the stack walking code from Kipp Hickman and Matt Pietrek (see this article). For Linux, Mike Shaver, Bruce Mitchener, and Ramiro Estrugo. all helped me get things right. Mucho gusto. Waldemar Horwat and Jim Roskind helped to improve the post-processing scripts.
further reading
- Finding leaks in Mozilla
- How to find leaks of XPCOM objects, which also describes how to use the refcount balancer
- How to debug memory leaks / refcount leaks