You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



Revision 1.0, 07/22/2003
Questions & Comments : Please mail Prabhat Hegde

Supporting Indic Scripts in Mozilla




Introduction

This is a two part document which outlines Indic support in Mozilla. This document pertains to all issues dealing with non-BiDi CTL support in Mozilla for Unix platforms. Though the design and implementation is Cross-Platform,  non Unix platforms have not been considered for implementation.

Complex Text Script Support (CTL)

The presentation of Indian Language scripts requires contextual processing for display and editing. This output technology is called Complex Text Layout (CTL).  CTL features enables the providers (such as OS/X-Windows/CDE-Motif, Gnome/Gtk/Pango based applications, or Cross-platform applications such as Mozilla or Star/OpenOffice) to support writing systems that require a complex set of transformations between logical (as in stored-text) and physical/Visual (as in display or printed) text-data representations. Additionally, CTL support  also defines the behaviour of character combinations and shaping,  Text edit operations and if needed, Component orientation.

Mozilla's non-Bidi CTL support currently provides for:
  1. Context Sensitive Shaping and Rendering on Unix platforms using  non-intelligent (non-OT) fonts encoded in "sun.unicode.india-0" encoding. A set of free fonts in this encoding is available for download at:
    http://developer.sun.com/techtopics/global/index.html
  2. Support for edit operations including Cursor positioning and Selection.
  3. Printing through Xprint.
  4. Features 1-3 above are available for Core-X and XFT2 backends.
  5. Currently  Hindi and Tamil (TISCII) support is available but its fairly easy to extend support for other Indic scripts using the same architecture.
Future enhancements will be to provide the same featureset but by using OpenType fonts which is the future roadmap for fonts for Indic scripts.

Mozilla also supports BiDi CTL more information for which is available at: http://www.langbox.com/bidimozilla

Indic Script Support Issues in the Browser (*nix only)

Input  Method support in Mozilla is handled by the underlying platform. This leaves the following issues:

  1. Output/Rendering support which includes:
    1. Font Handling
    2. Printing
    3. CTL Issues that cover:
    • Character-Clustering
    • Context-Sensitive Shaping
    • Joining
    • Split Glyphs
    • Character-Reordering
    • Edit Operations
  2. Data Exchange including handling ISCII-91 data and Mail

Font Handling is an issue for Indic scripts due to lack of font  (repotoire, layout and encoding) standardization as well as a uniform typographic framework across Windows, Unix and Unix variants. Similarly '2' above is an issue since ISCII-91 or its variants are not registered MIME-Charsets under IANA. However, lack of Localized applications and legacy data that comes with it means that this is less of an issue.

Design goals for Indic Support in Mozilla (*nix Only)

The original design & implementation goals were :
  1. Provide non-Bidi CTL script support for languages such as Thai and HIndi for Core-X11 fonts. on *nix platforms.
  2. Do not regress existing builds.
  3. Leverage existing code
  4. Provide pluggable interfaces to support additional scripts.
  5. Localize code-changes as much as possible.

Design approach

The design approach to solving Indic-CTL script issues are fairly localized since it has less effects on layout compared to Bidi-CTL scripts. The approach followed involves:
  1. In I18n(Encoder) layer
    • Identify groups of input character that form a logic chararacter.
    • Convert ISCII<->Unicode UCS-2 for ISCII support.
  2. In Layout layer
    • Identify Cluster-Boundaries, ie. groups of Unicode chars that form a logical unit of display.
    • During painting/measuring/sizing text do not split chars without losing Cluster-Boundary information.
    • While performing edit-operations such as cursor positioning, do not lose Cluster-Boundary information, ie. do not split by chars.
  3. In Graphic(gfx) layer
    • Identify Cluster-Boundaries, ie. groups of Unicode chars that form a logical unit of display.
    • Always perform a glyph-generation/context-sensitive shaping operation before performing a measuring/drawing operation.
    • Perform glyph generation.

Implementation approach

The design goals described above mapped to the following implementation approach.
  1. Provide a compile time switch to enable Indic builds which would not need to be turned on by default.
  2. Leverage Core-X11 shaper(s) API from pango (www.pango.org) since it is simple and cross-platform in nature.
  3. Leverage nsIUnicodeEncoder  & gfx.
  4. Create a light-weight CTL API to handle Clustering used for Edit operations.
  5. Use of pango.modules to additional scripts.
  6. Support not more than 2 non-OT fonts
In addition Jungshik Shin re-used the shapers to extend CTL functionality to XFT backend.

The implementation approaches in the font layer will vary depending on whether the support needs to be extended to OpenType fonts. My recommendation for supporting OpenType fonts is to re-use the light-weight API's created for the original implementation while using a combination of  ICU + Mozilla XFT2 for the same. The effort involved in doing so is fairly trivial since ICU already supports Indic scripts and ICU layout cab be built separately of the parent ICU. Specifically, it would involve the following (Minimum set of tasks) in Mozilla:
  1. Create an ICU-layout XPCOM component.
  2. Access OpenType Tables using (or adding to) Mozilla XFT API.
  3. Obtain Cluster-Boundary information from OpenType fonts using ICU+(adding/extending)Mozilla XFT.
  4. Create additional shapers that use ICU layout for Glyph-Generation.
  5. Modify gfx to call the shapers.
Code to  accesses the tables in the font (TrueType/OpenType/...) and rasterize the outlines is already present in Mozilla-XFT.  Code also exists in Mozilla to move the raster to the X server.

Implementation Details

Implementation details for Mozilla CTL code that covers Thai and Indic is as below:
  1. Building:
    Use --enable-CTL switch.

    Safe with Core-X11 and with --enable-default-toolkit=gtk2

  2. Complete set of newly created sources
    Search LXR using SUNCTL Macro

  3. External Interfaces
    components/libctl.so
    pango.modules
    libmozpango.so
    libmozpango-thaix.so
    libmozpango-dvngx.so
    .. .. ..
    .. .. ..

  4. Lighweight CTL API
    There are additional API that need to be checked in as a part of bugzilla ID

    http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsULE.h

  5. Adding fonts - Reference the following:
    http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetData.properties#161
    http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetTitles.properties#112

  6. Creating and adding CTLized Encoders - Reference
    http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsUnicodeToSunIndic.cpp
    http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsUnicodeToThaiTTF.cpp
    http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsCtlLEModule.cpp

  7. Creating and addiing CTLized Shapers
    http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/pangoLite/pango.modules
    http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/thaiShaper/
    http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/hindiShaper/

  8. Encoder Changes
    None at the moment since encoders in nsCTLLEModule register themselves.

  9. Layout Changes
    Interfaces PeekOffset, GetFocusOffset, GetPointFromOffset, .. etc will be amoung those affected in order to handle Edit Operations.  All interfaces that deal with obtaining text buffer offsets from a screen position are expected to be affected. Currently the changes are localized to nsTextFrame.cpp, -- #ifdef SUNCTL portions but this is expected to spread to more interfaces inthe following:


  10. Graphics Changes
    All entry to drawing and measurement API are expected to be affected.
    nsFontMetricsGTK.cpp,
    fontEncoding.properties -- XFT Support
    nsFontMetricsXlib.cpp  -- Core X & Xprint

Current Status

Bugs

Open Issues

References

Downloads

Contacts

Name
Role Contact
Jungshik Shin

jshin@mailaps.org
Roland Mainz

roland.mainz@informatik.med.uni-giessen.de
Prabhat Hegde
samskrita@yahoo.com
G. Karunakar

karunakar@freedomink.org