LDAP Replication Draft Analysis and Design Document
LDAP Replication is a process where a directory is replicated onto the local machine to enable access to data in case if there is no server connection.
The LDAP replication support we would provide in Mozilla / Netscape 6 is for replicating any People / corporate Addresses kinda directories onto the local machine in a local Address Book supported Mork database.
Netscape 4.x provides support for LDAP replication using LDAP SDK to get the entries for replication and stores the data locally in the Address Book neomagic DB that it supports. It provides an offline panel for each directory to allow the user to set a preference to sync the directory automatically when the user selects to go offline as well as provide a button to sync the directory instantaneously.
For the initial version of LDAP support we will provide support for replicating the directory instantaneously.
Replication Protocols: LCUP and Change Log
There is a couple of standard replication draft protocols called the Change Log protocol and LCUP (LDAP Client Update Protocol) which can be used for implementing replication. Although LCUP is the later one, most present day servers do not comply with it yet.
Servers implementing LCUP protocol provide a cookie to the client containing the state information for the client. For each replication search call the client sends back this cookie and the server takes care of returning only the changes done on the server after the last replication from the client. There is much more to this protocol but we will not go too much into details of this since we would not be implementing LCUP in the initial version. See LCUP Internet draft for details.
The Change Log protocol works on the basis of a change log. The server has a container which contains a list of changeLog entries. Each change log entry contains an incremental unique changelog number and the details of the entry, change type and the changes done to the entry. This container is obtained by reading the "changeLog" attribute of the server’s root DSE. The client retrieves all the changeLog entries with a changeLog number equal to and greater than its last replicated changeLog number for a directory. If the equal to entry is not returned the entire directory needs to be replicated. See Change Log Internet Draft for more details.
In Netscape 4x the Change Log protocol was implemented to do LDAP replication.
Although we will not be implementing the LCUP protocol for the initial version we will try to separate out the implementation of making the actual query for replication and the processing of the query results for replication. The two protocols for getting the changes are quite different but this will help us to reuse the results processing for replication, which involves parsing the results and saving it in the local Address Book Mork database, whatever may be the protocol implemented. We would also try to figure out a way to query the server, before beginning the initial replication (downloading all entries) for a directory, to find out what is the replication protocol supported by the server for that directory.
Alternative Replication Methodology
Another alternative to using the standard protocol as suggested by Mark Smith is to use the time stamps for replicating. The client stores the time stamp of the last update for the directory and at the time of replication compare the last update time stamp for the entries on the server. If the time stamp on the server is later the entry in the client is updated with the one on the server.
This methodology does not require the server to do anything special to support replication except maintain the last update time stamp attribute, which all servers do, as well as allow access to this attribute so that the client can compare its last update time stamp for the entries. One issue is if the naming convention for this operational attribute on the LDAP servers is standard, i.e. is the last update time stamp attribute called the same on all servers ?
On the other hand this Does have an impact on performance since at the time of replication each entry’s timestamp needs to be compared to decide which entries in the client side needs to be updated. Where as in the case of the Change Log protocol only the list of changelog entries need to be accessed.
Also for new records or deleted records or records whose RDN have been updated the processing might require to resync the complete directory. But even if this is done it needs to be known when or how often the resync needs to be done.
For the above replication protocols one issue that needs to be considered is authentication. For the ChangeLog protocol the user need to be able to access the change log container or the "changeLog" attribute of the RootDSE. Also even if there is access to this container, all attributes of this container should be allowed access. For the alternative methodology we need to be able to access the last updateTimestamp operational attribute for the directory as well as the server needs to be indexed on this by the administrator for effective server side processing for replication.
Maybe to deal with the changeLog situation the administrator should define a replication user authDN and password that can be used to access the Change Log entries. But maybe prefs is not a good place to store this sensitive information ! We should look more into the implementation done in 4x in this regard and see if that is the most effective way to deal with authentication.
A better way to deal with this is to make use of the Password Manager / Wallet feature in Mozilla / Netscape 6 to store the authDN and password used for LDAP replication. This would however require auto-config integration with the Password Manager.
Also it seems like the search for accessing the change log container can
be done by any user however the "changes" attribute can not be accessed.
To take care of this 4x makes another search to retrieve all the attributes
for the entries from the change log and then replaces the record for these
entries with the retrieved attributes data. This may have some performance
impact since otherwise the modified attributes can be directly accessed without
retrieving the entries again.
Various user and company specific preferences and details can be specified
for replication :
- Restricting the replication to what the user needs is a good way to keep the replication time under check. Each user should be allowed to specify a search filter so that only the entries useful for him/her are downloaded.
- replication search base for replication search
- lastChangeNumber / last updatedTimestamp
- configuration file for field mappings associated with the directory
- replication search timeout
The autocomplete using LDAP, stores several directory properties and stores
the information in the preferences. The code that reads these preferences
or definitly the directory preferences can be reused for getting the LDAP
properties to do the search for replication.
Mapping AB fields to LDAP attributes
The LDAP attributes needs to be mapped on to the AB fields / AB card. Each corporate customer can have separate attribute names for the People details directory attributes. There is a mapping proposed on bug # 118454 and related bugs. These bugs also deal with cases of multi valued attributes, multiple LDAP attribute names mapping to AB field, etc.
Maybe a configuration file should be defined which specifies this mapping which can be used at runtime to do the search as well as process the search results. There could be a differnet configuration file for each directory server. This configuration file should be allowed to be updated by the administrator depending on what attribute names are used by the corporate client and distributed maybe using autoconfig. Maybe a configuration UI can be provided to configure this mapping as part of the N6x MCD version !
The current strategy should however be to have a centralized class with
the existing code for this hardcoded map which can at a later stage be changed
to use a configuration file.
Storing the entries locally
The Address Book uses the mork database to store its data locally. We will go ahead with using the Mork databse since it will allow us to use existing Address Book XPCOM interfaces for our implementation. Maybe Berkley DB could be another option which may result into better performance but this may require some more coding even for the Address Book UI to display the data as well as will only have a performance gain only during LDAP replication of a complete directory which is a rarer event for a user.