Demystifying Footprint
Suresh Duddi <dp@netscape.com>Created: 30 Jan 2002; Last Modified: 2 March 2002
Glossary
It is highly recommended that these terms are looked up and understood.Virtual Memory
Working Set - The working set of a program or system is that memory (2) or set of addresses which it will use in the near future.
max-working-set - The union of all addresses accessed during a given time period {tbegin, tend}
Resident Set - In a
virtual memory(1) system, a process
resident set is that part of a process
address space which is currently in
main memory. If this does not include all of the process
working set, the system may thrash
. Usually this is the
Working Set, plus other pages that the app used earlier that have not
been swapped to disk. Operating Systems decide when to swap unused pages
of an app to disk. Mostly this decision is demand based - when other applications
need more physical memory and there is nothing free. Windows will swap
out all unused pages when an application is iconified.
Thrash
Footprint - can mean different things in different contexts.
For example, on a machine without virtual memory, `footprint' would probably
mean `the maximum amount of physical memory that the application requires.
Most modern operating systems have a Virtual Memory (VM) system that
gives each process its own Virtual Address Space. Only a part of the
virtual memory required by a process is paged into physical memory -
the part that is required now. This is called the Resident Set.
Process address space can be broken down into:
Stack grows | v |
^ | grows Heap |
static data |
Code (lib) |
Code (exe) |
Usually,
Virtual Memory > Resident Set > Working Set
When physical memory available to the app becomes less than the apps Working Set, the app will thrash causing poor performance.
Understanding the heap
"heap" is mostly data allocated using malloc() and free(). When applications request for memory, the allocator (implementation of malloc and friends) get VmData in bulk from the OS and manage it.Application Requests memory via calls to malloc() |
Allocator
- libc Implements malloc() Requests VmData from the OS using sbrk() or mmap() or VirtualAlloc() in bulk and manages the memory returned. |
Operating
System - kernel Manages physical memory and the mapping of individual processes' virutal memory into physical memory |
User Statement of the problem
- "When I run Netscape 6 for days on win98, I get alerts warning
me of low virual memory"
This could be because swap file is small on win98. SWAP SIZE + Physical memory size caps total amount of Vm used by all applications. So as processes use more Vm, we could hit this limit. It is really hard to hit this limit on WinNT, 2000, XP or linux.
- "On my 32MB windows/linux machine, Netscape 6 is slow and
sluggish"
This means that Netscape 6's working set for the user scenario is high enough that it cannot be all held in physical memory and the sytem thrashes as it cycles through to satisfy Netscape 6's working set.
- "When Iconifying and deiconifying Netscape 6, it takes a
long time to become active"
When iconified on windows Nt/2000/XP, OS actively start swapping out unused pages. When deiconified, all pages that are needed are swapped back in as always. If the working set for deiconifying and displaying Netscape 6 is large, a large amount of memory needs to be swapped back in and potentially in making room for that, that much memory needs to be swapped out of applications that are using it to disk. This could account for the delay in getting Netscape 6 to become active again post deiconification. Also this will account for other apps being slower when Netscape 6 is running as it causes them to trash more.
- "Netscape 6 is too fat, consumes too much memory"
This is a perception issue and has no real meaning to it.
Ideal Metrics
From the above statement of user problems and given the ideal tools that can measure anything, these would be the numbers to measure and improve:- Max-working-set : performance threshold
For a given scenario, union of the working set required by Netscape 6 at every instant during the scenario. This number would say that on a machine with backing store (swap), Netscape 6 needs this much physical memory available after the OS and other apps have been loaded to give the user a non-sluggish performance. This is a function of the process and not available physical memory (ie) for a given application and a senario, this is a constant number irrespective of what other apps are running or how much physical memory is available - Peak-vm-usage : pagesize threshold
For a given scenario, max vm requirement of Netscape6, assuming no allocator buffering of virtual memory. Usually allocators dont return virtual memory got from operating system ( via sbrk() ) unless the unused VM is greater than a threshold and is at the end of the processes' VM space.
Measurable Metrics
Max-working-set We currently dont have the means to measure this. We are working on it.Peak-vm-usage: This or a function of this can be measured reliably but is Operating System and allocator dependent.
User Scenarios for measurement:
- Run pageload test : http://cowtools.mcom.com
- - Startup w/default profile, home.netscape.com
- Read 10 messages of 200 from netscape.net email using Imap
- Compose & send one email
Windows
First let us see what each of the numbers reported by the various tools mean:- TaskManager : [Operating System] Task manage reports Resident Set Not much useful.
- Taskinfo 2000 : [Operating System] Reports total Vm Usage on 2000 and VmCode/VmData breakdown on win98. VmCode includes static data. Gives VmCode for each dll and the executable.
- SpaceTrace : [Application] Reports peak heap requested by application. VmData is a direct function of heap requested by app - the allocator stands between the app and the system and request Vm in bulk. It also decides when to give back unused memory back to the OS.
- HeapInfo : [Allocator] Reports Vm requested
by allocator from operating system on behalf of the application. Also reports
a breakdown of usage of this memory. win32 only
Linux
On linux, say 21078 was the process id for netscapechetu> grep Vm /proc/22049/status
VmSize: 41844 kB
VmRSS: 26260 kB
VmData: 20316 kB
VmLib: 17752 kB
VmSize |
Virtual memory usage of entire process = VmLib + VmExe + VmData + VmStk |
VmRSS |
Resident Set currently in physical memory
including Code, Data, Stack |
VmData |
Virtual memory usage of Heap |
VmStk |
Virtual memory usage of Stack. Doest change
much. |
VmExe |
Virtual memory usage by executable and statically
linked libraries 'man top' says this is broken ? |
VmLib |
Virtual memory usage by dlls loaded |
Goal
- Reduce Peak-vm-usage : Peak-VmData and Peak-VmCode
- Reduce Peak-working-set
Plan
Approach |
Impact |
Benefit |
---|---|---|
Release data no longer needed |
data |
Reduces peak vm usage. Frees more so allocator need
not get more from OS. Reduces max-vm-usage |
Reduce <64 byte allocations |
data |
Reduces overhead. Post startup: 15% of USED memory is consumed by overhead - data from HeapInfo Post startup: 78% (about 71,000) allocations are for < 64 bytes - data from SpaceTrace |
Reducing memory churn |
data |
This is performance not footprint. Since
fragmentation isnt high, this wont help footprint much. |
Reduce code |
code |
Caution: Focus on code needed for scenario
rather than any code This is going to be really hard to achieve for Mach-V Makes more sense for embedding. |
Delay load dlls |
code |
Reduces max-working-set and max-vm-usage Caution: Delaying is useful only if the dll load can be postponsed past the scenario |
Notes on allocator
Windows 2000
- Uses a best-fit allocator
- Fragmentation is less than 4% of free space - not a problem
- Rarely releases free space back to operating system : HeapCompact() doesn't do much
- Doest use mmap() much
- Allocated blocks are aligned on 8 byte boundaries
- Malloc overhead is usually 8 to 16 bytes per block
Linux
- Uses ptmalloc - a derivative of Doug Lea's boundary tag allocator
- Allocated blocks are aligned on 8 byte boundaries
- Malloc overhead is 4 bytes per allocation.
- Minium allocated size is 16 bytes
Recommended Reading
- Doug Lea's boundary tag allocator (Linux malloc) : http://gee.cs.oswego.edu/dl/html/malloc.html
- Dynamic Storage Allocation: A Survey and Critical Review : ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps
- http://developer.apple.com/techpubs/macosx/Essentials/Performance/VirtualMemory/Virtual_Mem_on_Mac_OS_X.html
-
http://developer.apple.com/techpubs/macosx/Essentials/Performance/VirtualMemory/Allocating__eing_Memory.html