performance: HTTP Compression
Last Updated: 15-September-1998
Getting the Apache module source.
This project aims to improve real and perceived web browsing performance by having the server send compressed HTML files to the browser, and having the browser uncompress before displaying. Assuming fast enough processors on most machines these days, the user should end up seeing the document sooner this way than sending uncompressed HTML. Also, since a majority of network traffic these days is HTTP traffic, compressing all HTML sent via HTTP should recover a significant amount of wasted network bandwidth.
Stage 1 - Content-Encoding: gzip
- Status: Complete
-
The current Mozilla source already sends Accept-encoding: gzip and can do a streaming decompression of HTML data received with Content-encoding: gzip. All that is needed is a server set up to serve this data to mozilla, while maintaining backwards compatibility with browsers that can't handle the compressed data.
To this end a new Apache 1.3 server module has been written. It is activated on a per-directory basis with a command in the access.conf file of the format:
CompressContent YesThis neatly solves the backwards compatibility problem for the browser, but creates a maintenance problem on the server end. One would need to run some sort of automated script to regularly maintain up to date compressed versions of files in the directories that needed them. For a solution to this maintenance problem, see Stage 2 below.
- Results:
-
Here is an optimal case where all images are in the cache. Notes:Local ISDN 64 kbits/sec 28.8 No GZIP GZIP No GZIP GZIP No GZIP GZIP 56.9 sec 61.0 sec 105.1 sec 83.2 sec 327.9 sec 121.8 sec 7% Slower 21% Faster 63% Faster - For the Local run both the client and server are running on the same machine, so we are seeing both the overhead for client unzip, and the slight extra overhead for the server to locate and send the gzipped content. (an extra call to stat() a file)
A more realistic workload was then generated simulating a user starting with an empty cache, and visiting the CNN site to read in order: Main Page, World, U.S., U.S. Local, Weather, Sci-Tech, Entertainment, Travel, Health, Style, and In-Depth.
Local ISDN 128 kbits/sec 28.8 14.4 No GZIP GZIP No GZIP GZIP No GZIP GZIP No GZIP GZIP 53.0 sec 53.2 sec 82.1 sec 77.6 sec 264.7 sec 184.4 sec 474.1 sec 307.7 sec 0.4% Slower 5.5% Faster 30% Faster 35% Faster - A much more realistic set of data with a mix of image hits and misses after the first CNN page.
- Note that the gzip cost on the local system is basically lost in the noise.
- Also all the image loads make the apparent gain at 28.8 much lower.
- It is curious that the 14.4 load doesn't show a greater speedup.
These results seem promising enough to warrant moving on to implementation of Stage 2.
Stage 2 - Transfer-Encoding: gzip
- Status: Begun
-
Here we hope to use the new HTTP1.1 TE: gzip header to request compressed versions of HTML files. Then the server would need to do streaming compression to generate the results. To minimize the overhead on the server it should keep a cache of the compressed files to quickly fill future requests for the same compressed data.
The current Mozilla source can already accept and decode Transfer-encoding: gzip data, but does not currently send the TE: header. Work has begun on implementing the streaming compression in the latest Netscape Enterprise Server. (General call for volunteers to implement this as a module for Apache 1.3).
Stage 3 - Other compression types
The previous two stages all dealt only with gzip as a form of compression. While a great general compression scheme, we probably want to negotiate compression type based on the data type requested. For example if the client requested with a TE: gzip header data that turned out to be a JPEG image, the server probably should know not to try to transfer-encode this with gzip.
Comments etc.
Any comments/questions or any volunteers to do the TE-aware Apache module, or other work, contact: Eric Bina.