Codesighs Introduction
What is Codesighs?
Codesighs is a set of tools to help you determine the code and data size of shared libraries and executables. Once you can measure the code and data size, then you can measure drifts in size as code changes occur.
Why use Codesighs when we already have file size on disk?
Codesighs does not look at the size on disk. Instead Codesighs
relies upon symbol data made available by either the information in the file
itself or by a linker map file. Using this data, Codesighs can tell the
difference between executable code and static data, and can also determine
the size of the symbols involved (e.g. functions, static variables).
File size on disk is an important metric for installers and for eyeballing
size differences on large changes. Codesighs offers you the opportunity to
measure even minute changes that may show no difference in file size.
A Short HOWTO
If you are starting from an existing mozilla source tree:
- Set MOZ_MAPINFO=1 in your build environment.
- cvs checkout mozilla/tools/codesighs
- Rerun mozilla/configure with your normal options, and in addition specify --enable-codesighs.
- If you are using a linux build:
- Run make in the mozilla/tools/codesighs directory.
- From the parent directory of the mozilla source tree, execute the following command: ./mozilla/tools/codesighs/autosummary.unix.bash results001.tsv results000.tsv summary001.txt
- If you are using a windows build:
- All .exe and .dll files need to be relinked, such that mapfile information is generated at link time. One way to do this is by rebuilding your entire tree; this will also build your mozilla/tools/codesighs directory.
- From the parent directory of the mozilla source tree, execute the following command form your bash shell: ./mozilla/tools/codesighs/autosummary.win.bash results001.tsv results000.tsv summary001.txt
- See below for a description of these files and the output of the script.
Otherwise, if you are starting with no source tree:
- Set MOZ_MAPINFO=1 in your build environment.
- Perform the normal source tree checkout steps.
- Enable --enable-codesighs by either placing in your .mozconfig file or by running configure manually.
- Proceed with the normal build steps.
- If you are using a linux build:
- From the parent directory of the mozilla source tree, execute the following command: ./mozilla/tools/codesighs/autosummary.unix.bash results001.tsv results000.tsv summary001.txt
- If you are using a windows build:
- From the parent directory of the mozilla source tree, execute the following command form your bash shell: ./mozilla/tools/codesighs/autosummary.win.bash results001.tsv results000.tsv summary001.txt
- See below for a description of these files and the output of the script.
The script itself will first output a single number which represents the
total size of all code and data in the considered executable build files.
The script may output a second number which represents the composite size
difference from the results000.tsv file to the results001.tsv file, but
only if the file results000.tsv is present.
The file results000.tsv does not need to exist beforehand, but go ahead
specify it anyway. If it does exist, it should be the results of a
prior run of the script (i.e. results001.tsv). By using prior results
of the script you can see the differences any source code changes applied
to your tree have caused.
The file results001.tsv will be overwritten to contain all symbol data
garnered form the build. Use this file in the future as the results000.tsv
file to see the differences any source code changes you apply to the source
tree cause. If interested, take a look at this file. The file
contains all symbols found sorted by their respective sizes. This information
could be a good starting point if you are interested in reducing the code
or data footprint of the build.
The file summary001.txt will contain any code or data size differences
between results000.tsv and results001.tsv. In addition, this file
will give a brief summary of the code and data sizes of the modules in the
build.
A Longer Introduction
Once you havve performed the steps in the shorter HOWTO and are interested in the niceties, this section is for you. By explaining each Codesighs tool separately, I hope to empower you to wield or modify them as you will. Also, simply reading the autosummary.*.bash scripts will cover almost everything I will state below.
- msmap2tsv
This command takes a MS linker .map file and converts it into
a format which codesighs understands.
As a warning, the symbol sizes this tool reports are not guaranteed. The
.map files produced by the MS linker do not specify sizes of the symbols,
but instead give offsets of the symbols in particular sections. msmap2tsv
uses these offsets and sections as clues to a symbol's size. All code
and data is accounted for, but the guesswork may improperly report some
symbol sizes. Some incorrect symbol sizes will include static functions
which are in the source file near the public reported symbol.
Here is a list of sections a .map file might contain. Knowing the
various sections can come in handy when trying to determine what the tool
output represents. In short, these sections control whether or not the
size of each section is attributed to code or data:
- bss: uninitialized data.
- crt: runtime library initialization/shutdown pointers.
- data: initialized data.
- debug: COFF debug information data.
- edata: exported functions data.
- idata: imported functions data.
- rdata: read only data.
- reloc: base relocations data.
- rsrc: resource data.
- text: machine code.
Further, the sections reported in the .map file may not be present in the
resultant executable. This is a positive result, as they are merged
by the linker and will cause less overhead; each section uses at least 4k
of system memory even if only 1 byte of the section is utilized. For
example, the edata and idata sections are normally merged with the rdata
section, bss is normally merged with the data section, et. al. MSDN
has some articles regarding merging of sections. If you see too many sections
via a "dumpbin /summary <filename.exe>" then perhaps one way to reduce
physical memory strain is to merge some of the sections.
Another thing to consdier is that at the time of this writing, msmap2tsv
does not demangle the symbol names reported in the mapfile. This can
make it slightly more difficult to recognize C++ symbols. On the other
hand, nm2tsv does demangle the names if you are using a linux build.
- nm2tsv
This command takes the output of the GNU nm tool and converts
it into a format which codesighs understands.
Specifically, the options to the nm tool should be: --format=bsd
--size-sort --print-file-name --demangle
The requirement for nm to be from GNU comes from some hard coded interpretations
regarding the symbol type. The symbol types are used in helping to
determine whether the symbol is code or data. Knowing these symbol
types can help in understanding the output of this tool. Some of the
types are as:
- B: uninitialized data
- D: initialized data
- R: read only data
- T: machine code
- V: weak object
- W: weak symbol
Because the nm tool reports the size of each symbol when the --size-sort switch is used, no guesswork is performed by this tool with regards to the symbol size.
- codesighs
This tool takes the output from msmap2tsv or nm2tsv and outputs
total sums regarding code and data size by module.
While researching various aspects of the tsv data, this tool is the easiest
research tool available.
This tool has a lot of command line options. Short of importing the
tsv output into a database to perform queries against, this tool is your best
option. You can specify a fairly verbose query using the command line.
For instance, if you were interested in the import and export symbol sizes
of a win32 build, you would perform the following command: codesighs.exe
--match-section idata --match-section edata --input somefile.tsv The
results of this command would show you the overhead incurred from importing
and exporting functions on win32.
Here's a hypothetical sample of the output of this tool using the switches
"--modules --match-module mozilla --match-module xpcom --match-module nspr"
on a tsv file generated from every mapfile found in the source tree:
Overall Size
Total: 4886384
Code: 1304442
Data: 3581942
xpcom
Total: 4364095
Code: 948124
Data: 3415971
nspr4
Total: 216929
Code: 175202
Data: 41727
mozilla
Total: 211255
Code: 102283
Data: 108972
nsprefm
Total: 94105
Code: 78833
Data: 15272
- maptsvdifftool
This tool is used to output a human readable change summary
of code and data size drifts. Used mainly in the autosummary.*.bash
scripts to show drifts after changes to a source tree occur, it is possible
to get the same results by hand.
These steps are vague in nature, but you should be able to follow them and
you will have a custom drift report in no time:
- Run nm2tsv on a binary or msmap2tsv on a mapfile to produce some tsv output. Do this to as many mapfiles or binaries as you see fit to produce the right tsv output.
- Sort the tsv output and save it away somewhere.
- Make whatever build changes you have in mind, and then reproduce the tsv output and sort it.
- Diff the first tsv output with the second tsv output.
- Run the diff results through maptsvdifftool to see the deltas in a human readable form.
- If there was no change, consider using the --zero-drift command line argument which will show all changes even if they result in a net zero change. This option will show you every minute change made to the symbols.
Here's a hypothetical sample of the output of this tool:
Overall Change in Size
Total: +6628
Code: +4133
Data: +2495
codesighs
Total: +6628
Code: +4133
Data: +2495
+4133 text (CODE)
+4112 codesighs.obj
+2192 _initOptions
+992 _cleanOptions
+928 _codesighs
+21 MSVCRTD:MSVCRTD.dll
+6 _strstr
+6 _strtoul
+6 __errno
+5 __strdup
-2 _printf
+2430 data (DATA)
+2432 UNDEF:codesighs:data
+2432 UNDEF:codesighs:data
-2 MSVCRTD:merr.obj
-2 ___defaultmatherr
+33 idata$6 (DATA)
+33 UNDEF:codesighs:idata$6
+33 UNDEF:codesighs:idata$6
+16 idata$4 (DATA)
+16 UNDEF:codesighs:idata$4
+16 UNDEF:codesighs:idata$4
+16 idata$5 (DATA)
+16 MSVCRTD:MSVCRTD.dll
+4 __imp__strstr
+4 __imp__strtoul
+4 \177MSVCRTD_NULL_THUNK_DATA
+4 __imp___errno