Revision 1.0, 07/22/2003
Questions & Comments : Please mail Prabhat Hegde
Supporting Indic Scripts in Mozilla
- Introduction
- Complex Text Script Support (CTL)
- Indic Script Support Issues inMozilla
- Design Goals
- Design Approach
- Implementation Approach
- Implementation Details
- Current Status
- Bugzilla Bugs
- Open Issues
- References
- Downloads
- Solaris9, Solaris 10
- Linux (RedHat 8.x)
- Indic Fonts that cover 8 scripts (sun.unicode.india-0
encoded)
- Contacts
Introduction
This is a two part document which outlines Indic support in Mozilla. This document pertains to all issues dealing with non-BiDi CTL support in Mozilla for Unix platforms. Though the design and implementation is Cross-Platform, non Unix platforms have not been considered for implementation.
Complex Text Script Support (CTL)
Mozilla's non-Bidi CTL support currently provides for:
- Context Sensitive Shaping and Rendering on Unix platforms using
non-intelligent (non-OT) fonts encoded in "sun.unicode.india-0"
encoding. A set of free fonts in this encoding is available for
download at:
http://developer.sun.com/techtopics/global/index.html
- Support for edit operations including Cursor positioning and Selection.
- Printing through Xprint.
- Features 1-3 above are available for Core-X and XFT2 backends.
- Currently Hindi and Tamil (TISCII) support is available but its
fairly easy to extend support for other Indic scripts using the same
architecture.
Mozilla also supports BiDi CTL more information for which is available at: http://www.langbox.com/bidimozilla
Indic Script Support Issues in the Browser (*nix only)
Input Method support in Mozilla is
handled by the underlying platform. This leaves the following issues:
- Output/Rendering support which includes:
- Font Handling
- Printing
- CTL Issues that cover:
- Character-Clustering
- Context-Sensitive Shaping
- Joining
- Split Glyphs
- Character-Reordering
- Edit Operations
- Data Exchange including handling ISCII-91 data and Mail
Font Handling is an issue for Indic
scripts due to lack of font (repotoire, layout and encoding)
standardization as well as a uniform typographic framework across
Windows, Unix and Unix variants. Similarly '2' above is an issue since
ISCII-91 or its variants are not registered MIME-Charsets under IANA.
However, lack of Localized applications and legacy data that comes with
it means that this is less of an issue.
Design goals for Indic Support in Mozilla (*nix Only)
- Provide non-Bidi CTL script support for languages such as Thai and HIndi for Core-X11 fonts. on *nix platforms.
- Do not regress existing builds.
- Leverage existing code
- Provide pluggable interfaces to support additional scripts.
- Localize code-changes as much as possible.
Design approach
The design approach to solving Indic-CTL script issues are fairly localized since it has less effects on layout compared to Bidi-CTL scripts. The approach followed involves:- In I18n(Encoder) layer
- Identify groups of input character that form a logic chararacter.
- Convert ISCII<->Unicode UCS-2 for ISCII support.
- In Layout layer
- Identify Cluster-Boundaries, ie. groups of Unicode chars that form a logical unit of display.
- During painting/measuring/sizing text do not split chars without losing Cluster-Boundary information.
- While performing edit-operations such as cursor positioning,
do not lose Cluster-Boundary information, ie. do not split by chars.
- In Graphic(gfx) layer
- Identify Cluster-Boundaries, ie. groups of Unicode chars that form a logical unit of display.
- Always perform a glyph-generation/context-sensitive shaping operation before performing a measuring/drawing operation.
- Perform glyph generation.
Implementation approach
The design goals described above mapped to the following implementation approach.- Provide a compile time switch to enable Indic builds which would not need to be turned on by default.
- Leverage Core-X11 shaper(s) API from pango (www.pango.org) since it is simple and cross-platform in nature.
- Leverage nsIUnicodeEncoder & gfx.
- Create a light-weight CTL API to handle Clustering used for Edit operations.
- Use of pango.modules to additional scripts.
- Support not more than 2 non-OT fonts
The implementation approaches in the font layer will vary depending on whether the support needs to be extended to OpenType fonts. My recommendation for supporting OpenType fonts is to re-use the light-weight API's created for the original implementation while using a combination of ICU + Mozilla XFT2 for the same. The effort involved in doing so is fairly trivial since ICU already supports Indic scripts and ICU layout cab be built separately of the parent ICU. Specifically, it would involve the following (Minimum set of tasks) in Mozilla:
- Create an ICU-layout XPCOM component.
- Access OpenType Tables using (or adding to) Mozilla XFT API.
- Obtain Cluster-Boundary information from OpenType fonts using ICU+(adding/extending)Mozilla XFT.
- Create additional shapers that use ICU layout for Glyph-Generation.
- Modify gfx to call the shapers.
Implementation Details
Implementation details for Mozilla CTL code that covers Thai and Indic is as below:- Building:
Use --enable-CTL switch.
Safe with Core-X11 and with --enable-default-toolkit=gtk2
- Complete set of newly created sources
Search LXR using SUNCTL Macro
- External Interfaces
components/libctl.so
pango.modules
libmozpango.so
libmozpango-thaix.so
libmozpango-dvngx.so
.. .. ..
.. .. ..
- Lighweight CTL API
There are additional API that need to be checked in as a part of bugzilla ID
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsULE.h
- Adding fonts - Reference the following:
http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetData.properties#161
http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetTitles.properties#112
- Creating and adding CTLized Encoders - Reference
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsUnicodeToSunIndic.cpp
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsUnicodeToThaiTTF.cpp
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsCtlLEModule.cpp
- Creating and addiing CTLized Shapers
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/pangoLite/pango.modules
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/thaiShaper/
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/hindiShaper/
- Encoder Changes
None at the moment since encoders in nsCTLLEModule register themselves.
- Layout Changes
Interfaces PeekOffset, GetFocusOffset, GetPointFromOffset, .. etc will be amoung those affected in order to handle Edit Operations. All interfaces that deal with obtaining text buffer offsets from a screen position are expected to be affected. Currently the changes are localized to nsTextFrame.cpp, -- #ifdef SUNCTL portions but this is expected to spread to more interfaces inthe following:
- Graphics Changes
All entry to drawing and measurement API are expected to be affected.
nsFontMetricsGTK.cpp,
fontEncoding.properties -- XFT Support
nsFontMetricsXlib.cpp -- Core X & Xprint
Current Status
Bugs
Open Issues
References
Downloads
Contacts
Name |
Role | Contact |
Jungshik Shin |
jshin@mailaps.org |
|
Roland Mainz |
roland.mainz@informatik.med.uni-giessen.de |
|
Prabhat Hegde | samskrita@yahoo.com |
|
G. Karunakar |
karunakar@freedomink.org |