Many CJK characters have variant forms. They are a sort of grey area somewhere between being totally irrelevant and semantically distinct; for this reason, the Unicode consortium decided to introduce Ideographic Variation Sequences (IVS), consisting of a Unicode base character and one of 240 variant selectors (U+E0100-U+E01EF), instead of further extending the already huge code range for CJK characters.
An IVS is registered and unique; for further details please refer to Unicode Technical Report #37, the Ideographic Variation Database. To date (October 2007), the character with the most variants is U+908A, having 8 such IVS.
Adobe and MS decided to support IVS with a new cmap subtable (format 14). It is an odd subtable because it is not a mapping of input code points to glyphs, but contains lists of all variants supported by the font.
A variant may be either ‘default’ or ‘non-default’. A default variant is the one you will get for that code point if you look it up in the standard Unicode cmap. A non-default variant is a different glyph.
|