[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Devel] Adobe's CMap types [was: more on Shingo]

From: Werner LEMBERG
Subject: [Devel] Adobe's CMap types [was: more on Shingo]
Date: Mon, 08 Mar 2004 20:24:33 +0100 (CET)

> >   You have to select an input encoding first before using an SFD
> >   file.  For CID-keyed fonts you must select a CMap instead.
> I guess I'm confused.  As far as I'm concerned there is ALWAYS an
> encoding inside fontforge.  ROSes are just as valid an encoding as
> any other.  I've never understood how Adobe could claim they weren't
> an encoding.
> A ROS provides a mapping from a CID to a character.  To me, that's
> an encoding

You are hitting a quite complicated topic.  Adobe is right, a
Registry-Ordering-Supplement does *not* define a character encoding.
It defines an ordered glyph collection.  The entities which are
numbered in an ROS are *glyphs*, not characters.  Normally, an ROS is
a superset of many (input) character encoding repertoires plus various
representation forms.  For example, Adobe-Japan1-4 contains at
position 13304 a rotated dingbat representation form for U+300E, LEFT
WHITE CORNER BRACKET.  While there is a similar Unicode input code
BRACKET, for character set round-trip conversion), other vertical or
rotated variants like CID 13306 (for U+301A, LEFT WHITE SQUARE
BRACKET) don't have a Unicode character (and never will have).

Adobe, as far as I can see, provides four kinds of CMaps which
unfortunately are hard to distinguish due to missing comments:

  . input conversion CMaps, which convert one input character set to

  . output conversion CMaps, which map CIDs to other font indices

  . input CMaps, for mapping an input character set onto an ROS

  . output CMaps, for mapping CIDs back to input characters (e.g., for
    /ToUnicode maps).  This mapping looses information.

Some examples, concentrating on U+300E, LEFT WHITE CORNER BRACKET:

  . The CMap `90ms-RKSJ-UCS2' is an input conversion CMap; for
    example, the range 0x8171-0x817A in the SJIS character set (as
    defined by Microsoft 1990 -- this is `90ms'; I don't know what
    `RKSJ' means) is mapped to the Unicode range U+3008-U+3011.

  . The CMap `Adobe-Japan1-PS-H' is an output conversion CMap: CIDs
    from the older `Adobe-Japan1-2' ROS are mapped to PS Type 0
    subfont indices for a PS printer.

  . The CMap `UniJIS-UCS2-H' is an input conversion CMap.  It maps
    Unicode character codes covering the character repertoire of the
    JIS input character sets (various versions of JIS 0208 and more)
    to CIDs for horizontal representation forms of the Adobe-Japan1-4
    ROS.  The range U+3008-U+3011 is mapped to CIDs 682-691.

  . The CMap `UniJIS-UCS2-V' is also an input conversion CMap.  It
    first loads `UniJIS-UCS2-H' before overwriting some entries.  The
    range U+3008-U+3011 is mapped to CIDs 7907-7916.

  . The CMap `Adobe-Japan1-UCS2' is an output CMap: CID range
    13302-13305 (0x33F6-0x33F9) in `Adobe-Japan1-4' is mapped to
    U+300C-U+300E; after the mapping operation, you no longer know
    that the original glyph was a rotated representation form.

Valuable information about CIDs and CMaps can be found in Ken Lunde's
`CJKV Information Processing'.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]