You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.


Using trace-malloc to Measure Live Bloat

Contact:
Chris Waterson (waterson@netscape.com)

Overview

Any process that uses the C runtime heap (i.e., malloc() and friends) will have a set of ``live'' objects in the heap; that is, memory that has been malloc()'d but not yet free()'d. In a large system, it's hard to know exactly what things are taking up space in the heap; we refer to this as the ``live bloat''. This document describes how to use the trace-malloc tool to analyze the live object bloat in a process.

Trace-malloc is a tool that Brendan Eich wrote to capture all the heap activity that happens in the system. The code was originally written on Linux, and uses ``weak symbols'' to subvert the libc allocator. (As of this writing, it has also been ported to Win32, but not Mac.)

One of the ways that trace-malloc tracks heap activity is to record all of the ``live objects'' that exist in an process. Specifically, trace-malloc maintains a table of malloc()'d addresses in the application heap, along with the requested size and the call stack at the time that malloc() was invoked. When an object is free()'d, its entry is removed from the table.

A snapshot of the live object table can be dumped to disk; the dump file can then be analyzed off-line to determine what objects took up space in the process's heap at the moment the snapshot was made.

Getting Started

Linux

The trace-malloc code is not normally compiled in to Mozilla. To enable it, you'll need to configure your build with the following options:

--enable-cpp-rtti
--enable-trace-malloc
--disable-debug
--enable-optimize="-O -g"

By forcing --disable-debug, we avoid compiling in any #ifdef DEBUG: object sizes may be skewed in a debug build. To get symbol information, we add the -g flag as an optimize option. Do not use the --enable-strip-libs option, as this will make it impossible for trace-malloc to collect useful stack information. As of this writing, you'll need to use egcs-1.1.2 as your compiler, due to bug 62996.

Windows

(Insert Win32 instructions here.)

Tools

You'll also need to checkout the mozilla/tools/trace-malloc directory, which is not pulled as part of the normal SeaMonkey module:

/usr/src/seamonkey/mozilla$ cd tools
/usr/src/seamonkey/mozilla/tools$ cvs up -d trace-malloc
...

Collecting Live Bloat Snapshots

In an --enable-trace-malloc build, each DOM window object has a TraceMallocDumpAllocations() method that accepts a single argument: the name of the file into which the live bloat table should be dumped. For an example, see the live-bloat.html file in the mozilla/tools/trace-malloc directory. This simple HTML file will dump the live object table when you press a button.

To collect live object information, run your trace-malloc enabled build as follows:

/usr/src/seamonkey/mozilla/dist/bin$ LD_PRELOAD=`pwd`/libxpcom.so \
> LD_LIBRARY_PATH=`pwd` \
> MOZILLA_FIVE_HOME=`pwd` \
> ./mozilla-bin --trace-malloc /dev/null

Now, get the application into the state you'd like to snapshot, and call TraceMallocDumpAllocations() (e.g., by loading live-bloat.html in another window and pressing the ``dump'' button). Beware that the application will run slowly, and that the dump files are typically hundreds of megabytes in size.

Snapshot Format

For each live object, an entry like the following will be created in the dump file:

address    object type
|          |
v          v
0x08404EA0 <nsCSSMargin> (64) <- requested size, in bytes
	0x41AE3CF8 <- object fields
	0x00000000
	0x08405148
	0x084050D0
	0x08405120
	0x084050F8
	0x00000000
	0x00000000
	0x00000000
	0x00000000
	0x00000000
	0x00000000
	0x00000000
	0x00000000   stack trace at time of allocation
	0x00000000   |
	0x00000000   v
__builtin_new[/usr/lib/libstdc++-libc6.1-1.so.2 +0x311E6]
CSSDeclarationImpl::AppendValue(nsCSSProperty, nsCSSValue const &)[/usr/src/seamonkey/mozilla/dist/bin/components/libgkcontent.so +0x355EA3]
CSSParserImpl::AppendValue(nsICSSDeclaration *, nsCSSProperty, nsCSSValue const &, int &)[/usr/src/seamonkey/mozilla/dist/bin/components/libgkcontent.so +0x36FD4C]
CSSParserImpl::ParseBorderSide(int &, nsICSSDeclaration *, nsCSSProperty const *, int &)[/usr/src/seamonkey/mozilla/dist/bin/components/libgkcontent.so +0x3728A5]
CSSParserImpl::ParseProperty(int &, nsICSSDeclaration *, nsCSSProperty, int &)[/usr/src/seamonkey/mozilla/dist/bin/components/libgkcontent.so +0x3703D2]
(more stack frames follow...)

The object type is inferred using C++ RTTI; however, this only works for C++ objects with vtables. For memory allocated with malloc(), or C++ objects that don't have a vtable, the type is reported as void*.

Analyzing a Snapshot

In the raw, the snapshots are not particularly useful. There is a collection of tools in the mozilla/tools/trace-malloc directory that do analysis of a snapshot.

Type Histogram

The histogram.pl script produces a type histogram of the objects in the snapshot.

/usr/src/seamonkey/mozilla/dist/bin$ ../../tools/trace-malloc/histogram.pl \
> allocations.log > /tmp/mozilla.histogram

The type histogram lists each type that appeared in the snapshot, along with the number of objects of that type, and the total number of bytes that the objects occupied. It uses a type database to infer each void* object's type from the stack trace.

The output from this tool is very raw: the histogram-pretty.sh shell script produces a ``top twenty'' list from the results of histogram.pl:

/usr/src/seamonkey/mozilla/dist/bin$ ../../tools/trace-malloc/histogram-pretty.sh \
> /tmp/mozilla.histogram > /tmp/mozilla.top

Its output is a table like this:

Type                    Count    Bytes %Total
TOTAL                  136422  5802417 100.00
void*                   40587  1944281  33.51
JSScopeProperty         13497   485892   8.37
js_MatchToken           15124   429248   7.40
JS-function             10092   264744   4.56
JS-Array                 3857   125320   2.16
xpti-unclassified         115   120968   2.08
JS-unclassified          2795   117916   2.03
nsVoidArray              3194   105100   1.81
JS-GC-arena                11   101629   1.75
unclassified-string      3238    95144   1.64
JS-script                 634    94983   1.64
JS-atom                  3328    93092   1.60
gtk-unclassified          939    92555   1.60
AtomImpl                 3359    90980   1.57
nsXULElement             1204    86688   1.49
nsXULAttribute              6    86016   1.48
JSScope                  1627    84604   1.46
nsComponentManagerImpl   2701    82800   1.43
nsCStringKey             3600    82500   1.42
OTHER                   26514  1217957  20.99

It is possible to compare two histogram files created by histogram.pl using the histogram-diff.sh shell script:

/usr/src/seamonkey/mozilla/dist/bin$ ../../tools/trace-malloc/histogram-diff.sh \
> /tmp/before-open-mail.histogram \
> /tmp/after-open-mail.histogram > /tmp/open-mail-diff.top

Improving the Type Database

The type database in mozilla/tools/trace-malloc/types.dat is only as good as the stack-based inference rules that are put into it. The uncategorized.pl script allows you to look at how effective the stack-based inference rules performed on a particular live object dump.

/usr/src/seamonkey/mozilla/dist/bin$ ../../tools/trace-malloc/uncategorized.pl \
> allocations.log > /tmp/uncat

The script groups together objects that it couldn't infer the type for by stack trace:

(217297) __builtin_new
  (83369) orkinHeap::Alloc(nsIMdbEnv *, unsigned int, void **)
    (65544) morkZone::zone_new_hunk(morkEnv *, unsigned int)
      (65544) morkZone::zone_grow_at(morkEnv *, unsigned int)
        (65544) morkZone::zone_new_chip(morkEnv *, unsigned int)
          (65544) morkZone::ZoneNewChip(morkEnv *, unsigned int)
            (65544) morkPool::NewFarBookAtomCopy(morkEnv *, morkFarBookAtom const &, morkZone *)
              (65544) morkAtomSpace::MakeBookAtomCopyWithAid(morkEnv *, morkFarBookAtom const &, unsigned int)
                (65544) morkStore::AddAlias(morkEnv *, morkMid const &, unsigned int)
    (11096) morkMap::clear_alloc(morkEnv *, unsigned int)
      (3756) morkMap::new_keys(morkEnv *, unsigned int)
        (3756) morkMap::new_arrays(morkEnv *, morkHashArrays *, unsigned int)
(more data follows...)

This means that 217,297 bytes were allocated through __builtin_new that couldn't be classified. Of those, 83,369 bytes came from orkinHeap::Alloc(). Of that, 65,544 bytes came from morkZone::zone_new_hunk(), and 11,096 bytes came from morkMap::clear_alloc(). (The rest of the output was truncated for clarity's sake.)

This information might lead us to add the following new rules to types.dat:

<morkFarBookAtom>
__builtin_new
orkinHeap::Alloc(nsIMdbEnv *, unsigned int, void **)
morkZone::zone_new_hunk(morkEnv *, unsigned int)
morkZone::zone_grow_at(morkEnv *, unsigned int)
morkZone::zone_new_chip(morkEnv *, unsigned int)
morkZone::ZoneNewChip(morkEnv *, unsigned int)
morkPool::NewFarBookAtomCopy(morkEnv *, morkFarBookAtom const &, morkZone *)
morkAtomSpace::MakeBookAtomCopyWithAid(morkEnv *, morkFarBookAtom const &, unsigned int)
morkStore::AddAlias(morkEnv *, morkMid const &, unsigned int)

<morkMap>
__builtin_new
orkinHeap::Alloc(nsIMdbEnv *, unsigned int, void **)
morkMap::clear_alloc(morkEnv *, unsigned int)