You are currently viewing a snapshot of taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to, please file a bug.

Demystifying Footprint

Suresh Duddi <>

Created:  30 Jan 2002; Last Modified: 2 March 2002


It is highly recommended that these terms are looked up and understood.

Virtual Memory

Working Set - The working set of a program or system is that memory (2) or set of addresses which it will use in the near future.

max-working-set - The union of all addresses accessed during a given time period {tbegin, tend}

Resident Set - In a virtual memory(1) system, a process resident set is that part of a process address space which is currently in main memory. If this does not include all of the process working set, the system may thrash . Usually this is the Working Set, plus other pages that the app used earlier that have not been swapped to disk. Operating Systems decide when to swap unused pages of an app to disk. Mostly this decision is demand based - when other applications need more physical memory and there is nothing free. Windows will swap out all unused pages when an application is iconified.


Footprint - can mean different things in different contexts. For example, on a machine without virtual memory, `footprint' would probably mean `the maximum amount of physical memory that the application requires.

Most modern operating systems have a Virtual Memory (VM) system that gives each process its own Virtual Address Space. Only a part of the virtual memory required by a process is paged into physical memory - the part that is required now. This is called the Resident Set.

Process address space can be broken down into:

static data

Code (lib)

Code (exe)


    Virtual Memory > Resident Set > Working Set

When physical memory available to the app becomes less than the apps Working Set, the app will thrash causing poor performance.

Understanding the heap

"heap" is mostly data allocated using malloc() and free(). When applications request for memory, the allocator (implementation of malloc and friends) get VmData in bulk from the OS and manage it.

Requests memory via calls to malloc()
Allocator - libc
Implements malloc()
Requests VmData from the OS using sbrk() or mmap() or VirtualAlloc()
in bulk and manages the memory returned.
Operating System - kernel
Manages physical memory and the mapping of individual processes' virutal memory into physical memory

User Statement of the problem 

  1. "When I run Netscape 6 for days on win98, I get alerts warning me of low virual memory"

    This could be because swap file is small on win98. SWAP SIZE + Physical memory size caps total amount of Vm used by all applications. So as processes use more Vm, we could hit this limit. It is really hard to hit this limit on WinNT, 2000, XP or linux.

  2. "On my 32MB windows/linux machine, Netscape 6 is slow and sluggish"

    This means that Netscape 6's working set for the user scenario is high enough that it cannot be all held in physical memory and the sytem thrashes as it cycles through to satisfy Netscape 6's working set.

  3. "When Iconifying and deiconifying Netscape 6, it takes a long time to become active"

    When iconified on windows Nt/2000/XP, OS actively start swapping out unused pages. When deiconified, all pages that are needed are swapped back in as always. If the working set for deiconifying and displaying Netscape 6 is large, a large amount of memory needs to be swapped back in and potentially in making room for that, that much memory needs to be swapped out of applications that are using it to disk. This could account for the delay in getting Netscape 6 to become active again post deiconification. Also this will account for other apps being slower when Netscape 6 is running as it causes them to trash more.

  4. "Netscape 6 is too fat, consumes too much memory"

    This is a perception issue and has no real meaning to it.

Ideal Metrics

From the above statement of user problems and given the ideal tools that can measure anything, these would be the numbers to measure and improve:
  1. Max-working-set : performance threshold
    For a given scenario, union of the working set required by Netscape 6 at every instant during the scenario. This number would say that on a machine with backing store (swap), Netscape 6 needs this much physical memory available after the OS and other apps have been loaded to give the user a non-sluggish performance. This is a function of the process and not available physical memory (ie) for a given application and a senario, this is a constant number irrespective of what other apps are running or how much physical memory is available
  2. Peak-vm-usage : pagesize threshold
    For a given scenario, max vm requirement of Netscape6, assuming no allocator buffering of virtual memory. Usually allocators dont return virtual memory got from operating system ( via sbrk() ) unless the unused VM is greater than a threshold and is at the end of the processes' VM space.

Measurable Metrics

Max-working-set We currently dont have the means to measure this. We are working on it.

Peak-vm-usage: This or a function of this can be measured reliably but is Operating System and allocator dependent.

User Scenarios for measurement:
  1. Run pageload test :
  2. - Startup w/default profile,
    - Read 10 messages of 200 from email using Imap
    - Compose & send one email


First let us see what each of the numbers reported by the various tools mean:
  1. TaskManager : [Operating System] Task manage reports Resident Set Not much useful.
  2. Taskinfo 2000 : [Operating System] Reports total Vm Usage on 2000 and VmCode/VmData breakdown on win98. VmCode includes static data. Gives VmCode for each dll and the executable.
  3. SpaceTrace : [Application] Reports peak heap requested by application. VmData is a direct function of heap requested by app - the allocator stands  between the app and the system and request Vm in bulk. It also decides when to give back unused memory back to the OS.
  4. HeapInfo : [Allocator] Reports Vm requested by allocator from operating system on behalf of the application. Also reports a breakdown of usage of this memory. win32 only


On linux, say 21078 was the process id for netscape

chetu> grep Vm /proc/22049/status
VmSize: 41844 kB
VmLck: 0 kB
VmRSS: 26260 kB
VmData: 20316 kB
VmStk: 80 kB
VmExe: 568 kB
VmLib: 17752 kB

Virtual memory usage of entire process
= VmLib + VmExe + VmData + VmStk
Resident Set currently in physical memory including Code, Data, Stack
Virtual memory usage of Heap
Virtual memory usage of Stack. Doest change much.
Virtual memory usage by executable and statically linked libraries 'man top' says this is broken ?
Virtual memory usage by dlls loaded


  1. Reduce Peak-vm-usage : Peak-VmData and Peak-VmCode
  2. Reduce Peak-working-set
for the user scenarios in the metric.


Release data no longer needed
Reduces peak vm usage. Frees more so allocator need not get more from OS. Reduces max-vm-usage
Reduce <64 byte allocations
Reduces overhead.
Post startup: 15% of USED memory is consumed by overhead  - data from HeapInfo
Post startup: 78% (about 71,000) allocations are for < 64 bytes - data from SpaceTrace
Reducing memory churn
This is performance not footprint. Since fragmentation isnt high, this wont help footprint much.
Reduce code
Caution: Focus on code needed for scenario rather than any code
This is going to be really hard to achieve for Mach-V Makes more sense for embedding.
Delay load dlls
Reduces max-working-set and max-vm-usage
Caution: Delaying is useful only if the dll load can be postponsed past the scenario

Notes on allocator

Windows 2000

  • Uses a best-fit allocator
  • Fragmentation is less than 4% of free space - not a problem
  • Rarely releases free space back to operating system : HeapCompact() doesn't do much
  • Doest use mmap() much
  • Allocated blocks are aligned on 8 byte boundaries
  • Malloc overhead is usually 8 to 16 bytes per block


  • Uses ptmalloc - a derivative of Doug Lea's boundary tag allocator
  • Allocated blocks are aligned on 8 byte boundaries
  • Malloc overhead is 4 bytes per allocation.
  • Minium allocated size is 16 bytes

Recommended Reading