freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] non-ascii in String INDEX of CFF opentype table


From: Hin-Tak Leung
Subject: Re: [ft-devel] non-ascii in String INDEX of CFF opentype table
Date: Sun, 31 Jan 2016 06:23:14 +0000 (UTC)

The CFF spec was first drafted in 1996, predates unicode, so really it most 
probably just means ascii - although one of my first serious use of computers 
was on one which does EBCDIC, which was decommissioned only slightly earlier in 
1992, I think; so even ascii may not be assumed - maybe some generic English 
encoding, the printable part.

The non-ascii I found was mostly just the copyright symbol in either Latin1 or 
utf8, plus one of the TexLive font authors put his Scandinavian name in the 
Notice in utf8. I imagine given the choice of his own name escape/hex encoded 
and not readable, or with an ascii readable transliteration, he might choose 
the latter...

--------------------------------------------
On Sun, 31/1/16, Adam Twardoch (List) <address@hidden> wrote:
 
 This is a quite terrible
 aspect of CFF. The CFF does not define at all what a
 "string" means in context of CFF. But since CFF is
 derived from PostScript Type 1, many think that the
 PostScript strings should be used. 
 
 PostScript strings are also a bit confusing
 because they require that parantheses are escaped, so
 something like "(c)" should actually be stored as
 "\(c\)". In fact, if you put a single unmatched
 unescaped closing paranthesis into a CFF string (something
 like "Type)Media"), such a font will not print on
 some printers because once converted to Type 1, the closing
 parantheses is interpreted as an early string termination.
 
 
 I've even managed to
 fool some tools by putting Type 1 operators after a
 ")" into a CFF string. While meaningless in CFF,
 they did become meaningful once they were embedded as Type 1
 is a PDF or print stream. 
 
 You could probably even terminate the font,
 execute arbitrary PostScript and start another font this
 way, all using PostScript code hidden after the unmatched
 ")" in a copyright string or so. Fun hacking
 grounds. :) 
 
 In PostScript
 strings, non-ASCII characters had to be escaped octal
 (\ddd), with ISO Latin-1 being the implied encoding for
 those. UTF-8 is not mentioned anywhere. 
 
 This leads to various tools and font vendors
 using various conventions. I've asked about this a few
 times on the OpenType list but there never was much feedback
 or consensus. 
 
 One feedback
 from someone at Adobe was that the unescaped copyright
 string encoded as ISO Latin-1 in Adobe’s fonts was a
 mistake. 
 
 I think the
 shared belief at Adobe was that the CFF stings should be
 ASCII-only, with escaped parantheses and octal-escaped
 higher ISO Latin-1. No Unicode. 
 
 Though I imagine that other types of escapement
 might work, too. It could be UTF-8 with each byte
 octal-escaped. Or it could (?) be just declared that direct
 UTF-8 with parantheses escapement is fine. 
 
 PostScript string escaping
 conventions: 
 http://www.tailrecursive.org/postscript/escapes.html
 
 A.
 
 Sent from my mobile phone.
 
 >
 On 31.01.2016, at 03:45, Hin-Tak Leung <address@hidden>
 wrote:
 > 
 > (removed a
 few of the SVG related Cc's from previous)
 > 
 > Here is another
 comment for discussion and possible inclusion to addendum to
 the opentype spec about CFF, after testing the new CFF
 processing machinery inside the Microsoft Font Validator
 against about 1600 CFF opentype fonts in my hard disk - all
 of fedora linux + mac os 10.9 + win 7 plus misc stash.
 > 
 > I found 5 bunch of
 fonts using non-ascii in the String INDEX. Since the CFF
 specs predates unicode, assumption about non-ascii being
 interpreted in utf8 encoding seems presumptuous; also seeing
 as this is part of the postscript technology, non-ascii
 string should be encoded the postscript way I.e. <hex>
 .
 > 
 > Anyway, the 5
 are Adobe's Arno* fonts, mozilla's fira* fonts, and
 two groups from TexLive, and a 5th. The non-ascii are used
 only for 'copyright' and 'notice' part of
 the Top DICT on closer look. Adobe uses Latin1 encoding,
 while the others use utf8. I think I 'll add a warning
 that other than those two (which obviously should and can
 contain anything including Klingon), most other use of
 non-ascii (most of the String index seems to be glyph names,
 a lot of unixxxxx) in the String INDEX should be postscript
 hexstring encoded.
 > Any comments?
 > 
 >
 --------------------------------------------
 > On Tue, 26/1/16, Hin-Tak Leung <address@hidden>
 wrote:
 > 
 > ... what
 next - from the newest and latest SVG table, I have
 > turned to having a look at the oldest
 unsupported one - CFF.
 > Microsoft did
 not implement any CFF checking at all, mostly
 > I guess due to their (past?) limitation of
 the MS renderer's
 > capability. Since
 we gained a Freetype-based backend in
 >
 autumn, just before it went MIT, that limitation is no
 > longer the case. I have already added CFF
 table processing
 > to extract the
 Postscript dictionaries and data structures,
 > and there is also the beginning of a new
 tool called
 > "CFFInfo" - a
 complete lack of imagination in naming - which
 > allows a power user (currently that means
 just me...) to
 > manually examines the
 Postscript dictionaries and data
 >
 structures in the CFF table of an open type font, just
 like
 > the DSIGInfo and SVGInfo tools.
 When CFFInfo matures, I'll
 > push it
 out. Actually checking Postscript dictionaries
 > within the font validator, in an automated
 manner, seems a
 > rather large and
 daunting task. Obviously Adobe would be an
 > interesting party to approach to see if
 they can commission
 > the work, so please
 forward if appropriate.  
 >  
 > 
 >
 _______________________________________________
 > Freetype-devel mailing list
 > address@hidden
 > https://lists.nongnu.org/mailman/listinfo/freetype-devel


reply via email to

[Prev in Thread] Current Thread [Next in Thread]