XPCOM Type Library
File Format

Version 1.0, Draft 2

Last updated: 
Author: Scott Furman <fur@netscape.com>

Latest version:
    http://mozilla.org/scriptable/typelib_file.html

Document History

Draft 2 -- http://mozilla.org/scriptable/typelib_file_v1_d2.html

Known Issues

Introduction

XPCOM type libraries, or "typelibs", are binary interface description files generated by the XPIDL compiler. Type libraries enumerate the methods of one or more interfaces, including detailed type information for each method parameter. The typelib is not merely a tokenized form of the IDL.  Rather, it's intended to accurately represent binary XPCOM interfaces, with annotations derived from the IDL.

Typelibs might be more aptly named "interface libraries", but Microsoft has already established a precedent with their naming scheme and we'll stick with it to avoid developer confusion.

Goals

Non-goals

Notation

The syntax used in this document to specify the layout of file data appears similar to C structs. Unlike C structs, however, data members are not subject to alignment restrictions.  Another difference from C structs is the use of pointer notation to represent 32-bit file offsets. For example, specifies a 32-bit field that contains the offset, in bytes to an array of one or more 16-bit values.   Unless otherwise noted, all file offsets are byte offsets from the beginning of the data pool  and are 32-bit signed quantities.  The first byte of the data pool is at offset 1, so as to allow offset 0 to be used as a special indicator.  By adding in an appropriate constant, these offsets are appropriate as arguments to seek().

Record fields with type boolean occupy one bit, not one byte.  A value of 1 represents true and a value of 0 represents false.

All integer fields with multibyte precision are stored in little-endian order, e.g. for a uint16 field, the low-order byte is stored in the file followed by the high-order byte.

Filename Suffix

The standard suffix for XPCOM type libraries is .xpt. (Do we need to define a standard four-character Mac signature/creator ?)

File Header

Every XPCOM typelib file begins with a header:
TypeLibHeader {
    char                     magic[16];
    uint8                    major_version;
    uint8                    minor_version;
    uint16                   num_interfaces;
    uint32                   file_length;
    InterfaceDirectoryEntry* interface_directory;
    uint8*                   data_pool;
    Annotation               annotations[];
}

magic

The first 16 bytes of the file always contain the following values:
       (hex) 58 50 43 4f 4d 0a 54 79 70 65 4c 69 62  0d 0a   1a
(C notation)  X  P  C  O  M \n  T  y  p  e  L  i  b  \r \n \032
This signature both identifies the file as an XPCOM typelib file and provides for immediate detection of common file-transfer problems, i.e. treatment of a binary file as if it was a text file. The CR-LF sequence catches file transformations that alter newline sequences. The control-Z character stops file display under MS-DOS. The linefeed in the sixth character checks for the inverse of the CR-LF translation problem. (A nod to the PNG folks for the inspiration behind using these special characters in the header.)

major_version, minor_version

These are the major and minor version numbers of the typelib file format. For this specification major_version is 0x01 and minor_version is 0x00. TypeLib files that share the same major version but have different minor versions are compatible. Changes to the major version represent typelib file formats that are not backward-compatible with parsers designed only to read earlier major versions. If a typelib file is encountered with a major version for which support is not available, the rest of the file should not be parsed.

num_interfaces

This indicates the number of InterfaceDirectoryEntry records that are at the offset indicated by the interface_directory field.

interface_directory

This field specifies a zero-relative byte offset from the beginning of the file.  It identifies the start of an array of InterfaceDirectoryEntry records.  If num_interfaces is zero, then this field should also be zero.  The value of this field should be a multiple of 4, i.e. the interface directory must be aligned on a 4-byte boundary.  (This is to guarantee aligned access if the typelib file is mmap'ed into memory.)

file_length

Total length of the typelib file, in bytes. This value can be compared to the length of the file reported by the OS so as to detect file truncation.

data_pool

The data pool is a heap-like storage area that is the container for most kinds of typelib data including, but not limited to InterfaceDescriptor, MethodDescriptor, ParamDescriptor, and TypeDescriptor records.  Note that, unlike most file offsets in a typelib, the value of data_pool is relative to the beginning of the file.

annotations

A variable-length array of variable-size records used to store secondary information, e.g. such as the name of the tool that generated the typelib file, the date it was generated, etc.

InterfaceDirectoryEntry

A contiguous array of fixed-size InterfaceDirectoryEntry records begins at the byte offset identified by the interface_directory field in the file header.  The array is used to quickly locate an interface description using its IID.  No interface should appear more than once in the array.
InterfaceDirectoryEntry {
    uint128              iid;
    Identifier*          name;
    InterfaceDescriptor* interface_descriptor;
}
An interface is said to be unresolved if its name is known, e.g. "nsISupports", but its IID and methods have not yet been determined.  In that case, both the iid and the interface_descriptor field will be set to zero.  If an interface is unresolved it must be linked with another typelib to resolve the interface, one that contains a resolved InterfaceDirectoryEntry for the interface in question.

A pointer to an InterfaceDirectoryEntry is always relative to the beginning of the file.  (This is different from other pointers in the typelib file, which are relative to the byte immediately before the data pool.)

iid

The iid field contains a 128-bit value representing the interface ID. The iid is created from an IID by concatenating the individual bytes of an IID in a particular order. For example, this IID:
{00112233-4455-6677-8899-aabbccddeeff}
is converted to the 128-bit value
0xffeeddccbbaa88996677445500112233
Note that the byte storage order corresponds to the layout of the nsIID C-struct on a little-endian architecture.

All InterfaceDirectoryEntry objects must appear sorted in increasing order of iid, so as to facilitate a binary search of the array.  (This means that unresolved interfaces appear at the beginning of the array.)

name

The human-readable name of this interface, e.g. "nsISupports", stored using the Identifier record format.

interface_descriptor

This is a byte offset from the beginning of the file to the corresponding InterfaceDescriptor object.

InterfaceDescriptor

An InterfaceDescriptor is a variable-size record used to describe a single XPCOM interface, including all of its methods:
InterfaceDescriptor {
    InterfaceDirectoryEntry* parent_interface;
    uint16                   num_methods;
    MethodDescriptor         method_descriptors[num_methods];
    uint16                   num_constants;
    ConstDescriptor          const_descriptors[num_constants];
}

parent_interface

An interface's methods are specified by composing the methods of an interface from which it is derived with additional methods it defines.  The method_descriptors array does not list any methods that the interface inherits from its parent and the parent_interface field contains a byte offset, relative to the beginning of the file, to the InterfaceDirectoryEntry of its parent interface.  This field has a value for nsISupports, the root of the interface inheritance hierarchy.

num_methods

The number of methods in the method_descriptors array.

method_descriptors

This is a byte offset from the beginning of the data pool to an array of MethodDescriptor objects.  The length of the array is determined by the num_methods field.

num_constants

The number of scoped interface constants in the const_descriptors array.

const_descriptors

This is a byte offset from the beginning of the data pool to an array of ConstDescriptor objects.  The length of the array is determined by the num_constants field.

ConstDescriptor

A ConstDescriptor is a variable-size record that records the name and value of a scoped interface constant.  All ConstDescriptor records have this form:
ConstDescriptor {
    Identifier*     name;
    TypeDescriptor  type;
    <type>          value;
}

name

The human-readable name of this constant, stored in the Identifier record format.

type

The type of the method parameter.  Types are restricted to the following subset of TypeDescriptors: int8, uint8, int16, uint16, int32, uint32, int64, uint64, wchar_t, char, string

value

The type (and thus the size) of the value record is determined by the contents of the associated TypeDescriptor record.  For instance, if type corresponds to int16, then value is a two-byte record consisting of a 16-bit signed integer.  For a ConstDescriptor type of string, the value record is of type String*, i.e. an offset within the data pool to a String record containing the constant string.

MethodDescriptor

A MethodDescriptor is a variable-size record used to describe a single interface method:
MethodDescriptor {
    boolean         is_getter;
    boolean         is_setter;
    boolean         is_varargs;
    boolean         is_constructor;
    uint4           reserved;
    Identifier*     name;
    uint8           num_args;
    ParamDescriptor params[num_args];
    ParamDescriptor result;
}

is_getter

This field is used to allow interface methods to act as property getters for object-oriented languages such as JavaScript.  It could be set as a result of defining an XPIDL attribute.  For example, if there was an XPIDL attribute named "Banjo",  you could access the "Banjo" property on an interface like so: 'myInterface.Banjo'.  Any prefix added  by the XPIDL compiler to an attribute's  identifier in the .h file, such as "Is" or "Get" should not appear in the method's name.

is_setter

This field is used to allow interface methods to act as property setters for object-oriented languages such as JavaScript.  It could be set as a result of defining an XPIDL attribute.  For example, if there was an XPIDL attribute named "Banjo",  you could assign to the "Banjo" property on an interface like so: 'myInterface.Banjo = 3'.  Any prefix added  by the XPIDL compiler to an attribute's  identifier in the .h file, such as "Is" or "Get" should not appear in the method's name.

is_varargs

If set, is_varargs indicates that the method is designed to accept a variable number of arguments from, say, a scripting language.  The exact details of how this might be done, however, is beyond the scope of the typelib definition.  (With XPComConnect, an nsVarArgs object is passed as the last parameter to such a method.  That object is a variable length array of argument values and types.)

is_constructor

This field indicates the default constructor for this interface, which may be useful for interfaces that act like factories.  For example, with an instance of an XPCOM interface named 'Foo', in JavaScript one might write 'new Foo(arg1, arg2)', thus calling this interface to be called;  The argument signature of an XPCOM constructor is:
NS_IRESULT ([arg,]*, out nsIID** result_type, out nsISupports** result)
That is, it's a function that takes zero or more arguments and creates a new interface returned through result with a type identified by the result_type output parameter.

name

The human-readable name of this method, e.g. "getWindow", stored in the Identifier record format.

num_args

The number of arguments that the method consumes.  Also, the number of elements in the params array.

params

This is a byte offset from the beginning of the data pool to an array of ParamDescriptor objects.  The length of the array is determined by the num_args field.

result

This is a byte offset from the beginning of the data pool to a single ParamDescriptor object that identifies the type of the method return value.

ParamDescriptor

A ParamDescriptor is a variable-size record used to describe either a single argument to a method or a method's result:
ParamDescriptor {
    boolean         in;
    boolean         out;
    uint6           reserved;
    TypeDescriptor  type;
}

in

If in is true, it indicates that the parameter is to be passed from caller to callee.

out

If out is true, it indicates that the parameter is to be passed from callee to caller.  Out parameters must have pointer type.  It is possible for a parameter to have both out and in bits set.

reserved

A 6-bit field reserved for future use.

type

The type of the method parameter.

TypeDescriptor

A TypeDescriptor is a variable-size record used to identify the type of a method argument or return value.  There are many XPCOM types that need to be represented in the typelib: [Editor: This specification does not yet cover pointers to unions, structs or arrays.]

To efficiently describe all the type categories listed above, there are several different variants of TypeDescriptor records:

union TypeDescriptor {
    SimpleTypeDescriptor;
    InterfaceTypeDescriptor;
    InterfaceIsTypeDescriptor;
}
The first byte of all these TypeDescriptor variants has the identical layout:
TypeDescriptorPrefix {
    boolean  is_pointer;
    boolean  is_unique_pointer;
    boolean  is_reference;
    uint5    tag;
}

is_pointer

This field is true only when representing C pointer/reference types.

is_unique_pointer

This field cannot have a value of true unless is_pointer is also true.  The unique_pointer field indicates if the parameter value can be aliased to another parameter value.  If unique_pointer is true, it must not be possible to reach the memory pointed at by this argument value from any other argument to the method.

is_reference

This field cannot have a value of true unless is_pointer is also true.  This field is true if the parameter is a reference, which is to say, it's a pointer that can't have a value of NULL.

tag

The tag field indicates which of the variant TypeDescriptor records is being used, and hence the way any remaining fields should be parsed.
 
 Value in tag field 
 TypeDescriptor variant to use 
0..15
SimpleTypeDescriptor
16
InterfaceTypeDescriptor
17
InterfaceIsTypeDescriptor
18..31
reserved

SimpleTypeDescriptor

The one-byte SimpleTypeDescriptor is a kind of TypeDescriptor used to represent scalar types,  pointers to scalar types, the void type,  the void* type and, as a special case, the nsIID* type:

is_pointer, tag

InterfaceTypeDescriptor

An InterfaceTypeDescriptor is used to represent either a pointer to an interface type or a pointer to a pointer to an interface type, e.g. nsISupports* or nsISupports**:
InterfaceTypeDescriptor {
    boolean                  is_pointer;
    boolean               is_unique_pointer;
    boolean               is_reference;
    uint5                 tag;
    InterfaceDirectoryEntry* interface;
}

is_pointer

When this field is false, the represented type is an interface pointer.  When is_pointer is true, the represented type is a pointer to an interface pointer.

tag

The tag field must have the decimal value 16.

interface

A byte-offset, relative to the beginning of the file, which indentifies the interface pointer's type.

InterfaceIsTypeDescriptor

An InterfaceIsTypeDescriptor describes an interface pointer type.  It is similar to an InterfaceTypeDescriptor except that the type of the interface pointer is specified at runtime by the value of another argument, rather than being specified by the typelib.
InterfaceIsTypeDescriptor {
    boolean  is_pointer;
    boolean  is_unique_pointer;
    boolean  is_reference;
    uint5    tag;
    uint8    arg_num;
}

tag

The tag field must have the decimal value 17.

arg_num

The index of the method argument that describes the type of the interface pointer.  The specified method argument must have type nsIID*.

Identifier

Identifier records are used to represent variable-length, human-readable strings:
Identifier {
    char   bytes[];
}

bytes

Unicode string encoded in UTF-8 format, NUL-terminated.

String

String records are used to represent variable-length, human-readable strings, possibly with embedded NUL's:
String {
    uint16 length;
    char   bytes[];
}

length

The length of the string, in characters (not bytes).

bytes

Unicode string encoded in UTF-8 format, with no null-termination.  The length of the bytes array, measured in Unicode characters (not bytes), is reported by the length field.

Annotation

Annotation records are variable-size records used to store secondary information about the typelib or about individual interfaces, e.g. such as the name of the tool that generated the typelib file, the date it was generated, etc.  The information is stored with very loose format requirements so as to allow virtually any private data to be stored in the typelib.
union Annotation {
    EmptyAnnotation
    PrivateAnnotation

}
EmptyAnnotation {
    boolean   is_last;
    uint7     tag;            // 0

}

PrivateAnnotation {
    boolean   is_last;
    uint7     tag;            // 1
    String    creator;
    String    private_data;
}

is_last

When true, no more Annotation records follow the current record.  If false, at least one Annotation record appears immediately after the current record.

tag

The tag field discriminates among the variant record types for Annotation's.  If the tag is 0, this record is an EmptyAnnotation.  EmptyAnnotation's are ignored - they're only used to indicate an array of Annotation's that's completely empty.  If the tag is 1, the record is a PrivateAnnotation.

creator

A string that identifies the application/tool/code that created the annotation, e.g. "XPIDL Compiler, Version 1.2".  There are no rules about the contents of the creator string other than that it be human-readable.

private_data

An opaque data array that is put into the typelib by the application/tool/code that created the typelib.  There are no restrictions on the format of the private_data.