[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] non-ascii in String INDEX of CFF opentype table

From: Adam Twardoch (List)
Subject: Re: [ft-devel] non-ascii in String INDEX of CFF opentype table
Date: Sun, 31 Jan 2016 04:24:01 +0100

This is a quite terrible aspect of CFF. The CFF does not define at all what a "string" means in context of CFF. But since CFF is derived from PostScript Type 1, many think that the PostScript strings should be used. 

PostScript strings are also a bit confusing because they require that parantheses are escaped, so something like "(c)" should actually be stored as "\(c\)". In fact, if you put a single unmatched unescaped closing paranthesis into a CFF string (something like "Type)Media"), such a font will not print on some printers because once converted to Type 1, the closing parantheses is interpreted as an early string termination. 

I've even managed to fool some tools by putting Type 1 operators after a ")" into a CFF string. While meaningless in CFF, they did become meaningful once they were embedded as Type 1 is a PDF or print stream. 

You could probably even terminate the font, execute arbitrary PostScript and start another font this way, all using PostScript code hidden after the unmatched ")" in a copyright string or so. Fun hacking grounds. :) 

In PostScript strings, non-ASCII characters had to be escaped octal (\ddd), with ISO Latin-1 being the implied encoding for those. UTF-8 is not mentioned anywhere. 

This leads to various tools and font vendors using various conventions. I've asked about this a few times on the OpenType list but there never was much feedback or consensus. 

One feedback from someone at Adobe was that the unescaped copyright string encoded as ISO Latin-1 in Adobe‚Äôs fonts was a mistake. 

I think the shared belief at Adobe was that the CFF stings should be ASCII-only, with escaped parantheses and octal-escaped higher ISO Latin-1. No Unicode. 

Though I imagine that other types of escapement might work, too. It could be UTF-8 with each byte octal-escaped. Or it could (?) be just declared that direct UTF-8 with parantheses escapement is fine. 

PostScript string escaping conventions:


Sent from my mobile phone.

On 31.01.2016, at 03:45, Hin-Tak Leung <address@hidden> wrote:

(removed a few of the SVG related Cc's from previous)

Here is another comment for discussion and possible inclusion to addendum to the opentype spec about CFF, after testing the new CFF processing machinery inside the Microsoft Font Validator against about 1600 CFF opentype fonts in my hard disk - all of fedora linux + mac os 10.9 + win 7 plus misc stash.

I found 5 bunch of fonts using non-ascii in the String INDEX. Since the CFF specs predates unicode, assumption about non-ascii being interpreted in utf8 encoding seems presumptuous; also seeing as this is part of the postscript technology, non-ascii string should be encoded the postscript way I.e. <hex> .

Anyway, the 5 are Adobe's Arno* fonts, mozilla's fira* fonts, and two groups from TexLive, and a 5th. The non-ascii are used only for 'copyright' and 'notice' part of the Top DICT on closer look. Adobe uses Latin1 encoding, while the others use utf8. I think I 'll add a warning that other than those two (which obviously should and can contain anything including Klingon), most other use of non-ascii (most of the String index seems to be glyph names, a lot of unixxxxx) in the String INDEX should be postscript hexstring encoded.
Any comments?

On Tue, 26/1/16, Hin-Tak Leung <address@hidden> wrote:

... what next - from the newest and latest SVG table, I have
turned to having a look at the oldest unsupported one - CFF.
Microsoft did not implement any CFF checking at all, mostly
I guess due to their (past?) limitation of the MS renderer's
capability. Since we gained a Freetype-based backend in
autumn, just before it went MIT, that limitation is no
longer the case. I have already added CFF table processing
to extract the Postscript dictionaries and data structures,
and there is also the beginning of a new tool called
"CFFInfo" - a complete lack of imagination in naming - which
allows a power user (currently that means just me...) to
manually examines the Postscript dictionaries and data
structures in the CFF table of an open type font, just like
the DSIGInfo and SVGInfo tools. When CFFInfo matures, I'll
push it out. Actually checking Postscript dictionaries
within the font validator, in an automated manner, seems a
rather large and daunting task. Obviously Adobe would be an
interesting party to approach to see if they can commission
the work, so please forward if appropriate. 

Freetype-devel mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]