You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



Revision 1.0, 07/30/2003
Questions & Comments : Please mail Prabhat Hegde

Testing Considerations For Mozilla Indic Script Support


  • Test URLs
    • Hindi
    • Kannada
    • Telugu
    • Tamil
    • Gujarati
    • Bengali
    • Malayalam
    • Oriya

  • Indic Font Issues
  • Hindi Test Cases
  • Hindi Testing reports
  • References

Introduction

This document outlines key Indic language features which will be useful to test Indic (Currently Hindi) featureset in the browser. The features covered are:

  1. Input
  2. Output (Display & and Printing)
  3. Hindi script shaping issues
  4. Edit Operations.

Complex Text Support (CTL)

The presentation of Indian Language scripts requires contextual processing for display and editing. This output technology is called Complex Text Layout (CTL). CTL features enables the providers (such as OS/X-Windows/CDE-Motif, Gnome/Gtk/Pango based applications, or Cross-platform applications such as Mozilla or Star/OpenOffice) to support writing systems that require a complex set of transformations between logical (as in stored-text) and physical/Visual (as in display or printed) text-data representations. Additionally, CTL support also defines the behaviour of character combinations and shaping, Text edit operations and if needed, Component orientation.

Indic Support Mozilla (*nix only)

Mozilla's supports the following features w.r.t Indic scripts:

  1. Context Sensitive Shaping and Rendering on Unix platforms using non-intelligent (non-OT) fonts encoded in "sun.unicode.india-0" encoding.
  2. CTLized Edit operations including Cursor positioning, Insertion, Deletion, Backspacing and Selection.
  3. Printing through XPrint.
  4. Features 1-3 above are available for Core-X and XFT2 backends.
  5. Edit operations in the Mozilla Viewer, Composer and Mail are all covered.
  6. Mail transfer is via UTF-8.
  7. Currently no ISCII support in view of lack of content (as in web-sites that i know of), and lack of registry for ISCII in IANA.
Currently Hindi is the only fully supported script.

For complete details about Indic Support project, visit www.mozilla.org/projects/ctl.


About Indic Scripts
All Indic Scripts are derived from the ancient Brahmi script. Describing them is out-of-scope here. You may find more information at one or more of the following:





The Devanagari Alphabet

The Devanagari alphabet is used for a number of Indian languages and dialects and closely related to a number of scripts used today in South Asia, Southeast Asia and Tibet. It is used for Sanskrit, Hindi, Marathi and Nepali. The alphabet can be divided into:

Consonants Devanagari Consonants
Vowels and corresponding Matras Devanagari Vowels
Signs
Dev Signs
Numerals Hindu Numerals

Indic Script Input

Inscript Keyboard Layout
INSCRIPT Keyboard Layouts are the de-facto standard for Indic Compose/Keyboard Overlay based input. They are as under:
http://java.sun.com/products/jfc/tsc/articles/InputMethod/indiclayout.html

Phonetic Input Methods (On Solaris10 and Mad-Hatter only)













Indic Script Edit Operations

This section describes the expected behaviour of Edit operations for Indic Scripts. The most important consideration is that edit operations need to preserve cluster boundaries. Where 'cluster' refers to a unit of display which may be composed of multiple characters in the alphabet. Edit Operation behaviour is common across all Indic scripts.

Indian Languages have their own semantics in dealing with text editing. Some text processing features which need to adapt to the new semantics are:

  1. Left and Right arrow should traverse the entire display unit .
  2. Caret /Cursor placement should not be in the middle of a display unit but should snap to the nearest edge.
  3. <Delete>/Del key deletes the entire cluster.
  4. <BKSP>/Backspace composes the cluster by character.
  5. Character breaking is by a cluster or alternatively a display unit.
  6. Line breaking is by danda ' '.
  7. Word breaking is by space.
  8. Mouse clicks need to snap to the nearest cluster boundary.
  9. Selection/Cut/Copy/Paste should follow the rules mentioned above

Edit Operation Testing

As described above the following opertions must be tested:

  1. Left & Right Arrow Keys
  2. Left & Right Arrow Keys with <Shift>
  3. Left & Right Arrow Keys with <Ctrl>
  4. Mouse Operations
  5. Insertion
  6. Replacement


Hindi Testing
Id
Operation

Initial Expected Result
1
Right Arrow
SwarajInput



2
Right Arrow




3
<CTRL>Right Arrow





4
<Shift> Right Arrow



5
Left Arrow





6
Left Arrow



7
<CTRL>Left Arrow




8
<Shift>Left Arrow



9
Mouse SnapTo



10
Mouse Double-Click/Select Word


11
Mouse Right
Cut/Paste


12
Mouse Left
Cut/Paste


13
<Shift> <CTRL> Left
Cut/Paste
14
<Shift> <CTRL> Right
Cut/Paste
15
Select clusters in a word - Select/Cut/Paste





16
Straddle Script boundary - Select/Cut/Paste


17




18
Backspace - Delete a character

(Start from end of the word)


19
Backspace


20
Backspace



21
Delete (Start from beginning of the word)

22
Delete (Start from beginning of the word)
23
Copy/Paste
Repeat 11-16 above but use Copy instead of Cut


24
Replace



25
Insertion (Add a Halant at the location shown by the cursor)



26



27




Same tests should be extended to multiline text.


Devanagari Character Shaping

Shaping is the process by which characters are rendered in the appropriate presentation forms. Some but not exhaustive set of rules to cover Devanagari Character Shaping is as below:

Id
Description/Rule Input = Expected Output

Conjunct formation
  • Formed by randomly combining consonants.
  • Consonants are written using half-forms for all consonants except the last.
  • Not all of the conjuncts can form valid and pronouncable syllables and thereby not all combinations produced are valid. Testing of valid combination is upto the testers knowledge of the script. It is not possible to provide a list here.

Hindi Conjunct Forms

Consonant combining with Matra
  • No Matra for अ which is implicit
  • Matras coincide with stem for consonants having them.
  • Matras are centered (w.r.t base consonant) for those with no stem.
  • is the only consonant with exceptions to the above rules.
Sample combinations for letter ma illustrated.
Hindi Combining

Re-Ordering
Note the re-ordering of the choti-I matra.
Hindi Choti-I Matra

Shaping
Samples of context sensitive shaping when consonants are joined together.
Hindi Context Sensitive Shaping

Half Forms
Half-forms are valid for all consonants except for , and
Half forms of some alphabets such as
& are rendered one below the other.


Ra Case (The first cluster is a case of ra)


Reph Case























Indic Font Issues

Indic Scripts currently lack key presentation standards such as a standardised Font encoding or a standardized glyphset as has been the case for other Complex Text Scripts such as Arabic (Arabic Presentation Forms A & B) or Thai (NEC TEC, TIS620. This results in considerable difficulty to the developer. Emergence of OpenType alleviates this problem. However, Indic OT fonts are not yet in wide-spread use (moreso for non-desktop processing applications) and hence a two-part solution is required: In the first phase a free font-encoding that supports all indic scripts will be supported and in the next phase OpenType support. A single "dumb" fontencoding 'sun.unicode.india-0" was chosen since it the only encoding that supports 8 Indian languages. This will reduce the need to support multiple non-standard encodings, a no-no for most global-ready applications such as Mozilla are standards based (Unicode for data-processing & standardised encodings for presentation) .

The following are some known Indic font encodings:

Linux & Solaris

Windows