user-agent strings
by Jamie Zawinski
28-Mar-98
|
- Introduction
- Background
- 10.15 User-Agent
-
The User-Agent request-header field contains information about the
user agent originating the request. This is for statistical purposes,
the tracing of protocol violations, and automated recognition of user
agents for the sake of tailoring responses to avoid particular user
agent limitations. Although it is not required, user agents should
include this field with requests. The field can contain multiple
product tokens (Section 3.7) and comments identifying the agent and
any subproducts which form a significant part of the user agent. By
convention, the product tokens are listed in order of their
significance for identifying the application.
User-Agent = "User-Agent" ":" 1*( product | comment ) product = token ["/" product-version ] product-version = token comment = "(" *( ctext | comment ) ")"ctext = <any TEXT excluding "(" and ")"> token = 1*<any CHAR except CTLs or tspecials> tspecials = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HTUser-Agent: CERN-LineMode/2.15 libwww/2.17b3
Note: Some current proxy applications append their product information to the list in the User-Agent field. This is not recommended, since it makes machine interpretation of these fields ambiguous.
Note: Some existing clients fail to restrict themselves to the product token syntax within the User-Agent field.
- System Info:
-
Within the system info, the text followed this syntax:
-
(
platform
;
security-level
)
( platform ; security-level ; OS-or-CPU-description )Platform was the general name of the user environment; values used in the past were Windows, Win16, Win95, WinNT, Macintosh, and X11.
Security-level was either U (for "U.S."), meaning high-quality encryption that generally cannot be exported from the United States; or I (for "international"), meaning weak encryption for which the U.S. government will grant an export license.
When appropriate, a third field would contain a description of the operating system or CPU type; in Unix terminology, this is the uname info.
- Localization Info:
-
Sometimes, a language specification would appear between the codename/version and the parenthesized part, like:
-
Mozilla/4.04 [es] (Win16; I)
This was the language for which the client had been localized: the language used for the menus and buttons in the user interface (as opposed to the user's language choice in preferences.)
The discerning among you will note that [ and ] are not allowed in User-Agent strings, since they do not match the token syntax, cited above. So that means that the [xx] syntax that Netscape Navigator had been using for localization information is invalid. It seems likely that there does not exist software anywhere in the world today which will be detrimentally affected by this, but still, standards are standards, so we should change that in future versions of Mozilla.
- Cloakers:
-
Rather than using other methods of content-negotiation, some ill-advised webmasters have chosen to look at the User-Agent to decide whether the browser being used was capable of using certain features (frames, for example), and would serve up different content for browsers that identified themselves as ``Mozilla''.
Consequently, Microsoft made their browser lie, and claim to be Mozilla, because that was the only way to let their users view many web pages in their full glory:
-
Mozilla/2.0 (compatible; MSIE 3.02; Update a; AOL 3.0; Windows 95)
Alas.
- Goals:
- Obey the standard described in RFC 1945 and RFC 2068;
- Don't break existing web servers;
- Don't break existing log-file analysis software;
- Keep the User-Agent string reasonably short;
- Use a consistent, obvious, and easy-to-parse format.
- Proposal:
- Platform or user environment (mandatory);
- Supported security level (mandatory);
- OS or CPU description (mandatory);
- Localization information (optional);
- ...any other tokens (optional).
- Windows for all Microsoft Windows environments;
- Macintosh for MacOS environments;
- X11 for X Window System environments.
- U for strong security;
- I for weak security.
- Win3.11 for Windows 3.11;
- NT3.51 for Windows NT 3.11;
- NT4.0 for Windows NT 4.0;
- Win95 for Windows 95;
- ...and so on.
- 68K for 68k hardware;
- PPC for PowerPC hardware;
- ...and so on.
The User-Agent string is the text that programs use to identify themselves to HTTP, mail and news servers, for usage tracking and other purposes.
It is desirable for there to be standardization in the format of these strings, for log-file analysis and other purposes. Therefore, it is desirable for all descendants of Mozilla to use the same basic form of User-Agent string.
I will start by presenting some facts about what Mozilla has used as a User-Agent in the past; and then I will make a proposal for how would like this to work in the future.
RFC 1945 (the HTTP 1.0 spec) has the following to say about the User-Agent string (and RFC 2068, the HTTP 1.1 spec, says pretty much the same thing):
|
In the past, Netscape products generated User-Agent strings that looked like this:
-
Mozilla/4.04 (X11; I; SunOS 5.4 sun4m)
Mozilla/4.04 (Win95; I)
Mozilla/4.04 (Macintosh; I; PPC)
According to the above syntax, this is a single product (Mozilla/4.04) followed by a comment (the system info.)
These are what I see as the most important goals for designing the future format of Mozilla's User-Agent string, ordered from most important to least. Not all of these will necessarily be achievable, but they are all highly desirable:
An explicit non-goal is to enable use of the User-Agent string for other, newer kinds of content-negotiation. While having a robust mechanism for content negotiation would be a good thing, it is widely accepted that the User-Agent string is the wrong way to do it. The IETF Content Negotiation Working Group is working on the content-negotiation problem.
The heart of the proposal is this: there are two tokens, one naming the vendor release, and one naming the mozilla.org release from which the vendor release is derived. They have independent numbering schemes.
Agents derived from a mozilla.org source release shall use the token Mozilla/N.M as the first element of their User-Agent string.
Since many agents derived from the mozilla.org source release will include other components, possibly significant ones (for example, crypto) it will be useful to identify not only the baseline release (the version of the source that mozilla.org puts out) but also more specific information about the derived version.
The obvious solution here is to add another product token. For example, if the hypothetical company Egregious Labs were to take the 9.52 source release from mozilla.org, add features to it, and distribute browser binaries to end users, they might pick a user agent string that looked like this:
-
Mozilla/9.52 Egregious/37.5a (Macintosh; I; PPC)
But what do those numbers mean?
As anyone working in the commercial software industry knows, version numbers are as much a matter of politics as they are a measure of age or functionality or maturity. They mean whatever you want them to mean. In particular, if there are two companies releasing software derived from the Mozilla code, it will be impossible to get them to agree on a common numbering scheme for their products.
More specifically, my proposal is as follows:
From time to time, mozilla.org will make source releases. These releases will have a major.minor version number associated with them.
The exact meaning of these numbers (for example, which releases are ``stable'' and which are not) is not important for the purposes of this document (and is orthogonally contentious.) What does matter is that the numbering system used by mozilla.org be backward compatible with the browsers in the world today: I believe this means, simply, it follows the major.minor pattern, and the major number not be smaller than 4.
Companies shipping products can and will pick whatever version numbers they like to identify their products (and perhaps might not even use numbers at all.) But they should use their version numbers in the User-Agent token corresponding to their product: not in the Mozilla/N.M token.
Vendors should put their tokens after the Mozilla/N.M token, in deference to existing log-file analysis software that expects the Mozilla token to come first.
Vendors should identify their platform in the historical way that Mozilla has done this in the past: by semicolon-separated tokens in the parenthesized comment portion.
The order of the intra-comment tokens shall be as follows:
The defined values for the platform token shall be:
The defined values for the security level token shall be:
The defined values for the OS/CPU token on Windows systems shall be:
The defined values for the OS/CPU description on MacOS systems shall be:
The defined values for the OS/CPU description on Unix
systems shall be the output of the command
The locale token, if any, shall be a two-letter country code.
Following this model, a release of Netscape Navigator numbered 5.0 might look like:
Mozilla/5.25 Netscape/5.0 (X11; U; IRIX 6.3 IP32) Mozilla/5.25 Netscape/5.0 (Windows; I; Win95) Mozilla/5.25 Netscape/5.0 (Macintosh; I; PPC)
That would indicate that the version number of the Netscape product was 5.0, and that it was derived from the mozilla.org source drop numbered 5.25.
Note that the localization info has been moved from a nonstandard [xx] token into the comment, like so:
Mozilla/6.35 Netscape/6.02 (Windows; I; Win3.11; es)
I believe this proposal will satisfy the desires of all involved, without introducing confusion or causing undue incompatibility with already-deployed software. If you have objections, let me know.