[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ft-devel] non-ascii in String INDEX of CFF opentype table
From: |
Hin-Tak Leung |
Subject: |
Re: [ft-devel] non-ascii in String INDEX of CFF opentype table |
Date: |
Sun, 31 Jan 2016 06:23:14 +0000 (UTC) |
The CFF spec was first drafted in 1996, predates unicode, so really it most
probably just means ascii - although one of my first serious use of computers
was on one which does EBCDIC, which was decommissioned only slightly earlier in
1992, I think; so even ascii may not be assumed - maybe some generic English
encoding, the printable part.
The non-ascii I found was mostly just the copyright symbol in either Latin1 or
utf8, plus one of the TexLive font authors put his Scandinavian name in the
Notice in utf8. I imagine given the choice of his own name escape/hex encoded
and not readable, or with an ascii readable transliteration, he might choose
the latter...
--------------------------------------------
On Sun, 31/1/16, Adam Twardoch (List) <address@hidden> wrote:
This is a quite terrible
aspect of CFF. The CFF does not define at all what a
"string" means in context of CFF. But since CFF is
derived from PostScript Type 1, many think that the
PostScript strings should be used.
PostScript strings are also a bit confusing
because they require that parantheses are escaped, so
something like "(c)" should actually be stored as
"\(c\)". In fact, if you put a single unmatched
unescaped closing paranthesis into a CFF string (something
like "Type)Media"), such a font will not print on
some printers because once converted to Type 1, the closing
parantheses is interpreted as an early string termination.
I've even managed to
fool some tools by putting Type 1 operators after a
")" into a CFF string. While meaningless in CFF,
they did become meaningful once they were embedded as Type 1
is a PDF or print stream.
You could probably even terminate the font,
execute arbitrary PostScript and start another font this
way, all using PostScript code hidden after the unmatched
")" in a copyright string or so. Fun hacking
grounds. :)
In PostScript
strings, non-ASCII characters had to be escaped octal
(\ddd), with ISO Latin-1 being the implied encoding for
those. UTF-8 is not mentioned anywhere.
This leads to various tools and font vendors
using various conventions. I've asked about this a few
times on the OpenType list but there never was much feedback
or consensus.
One feedback
from someone at Adobe was that the unescaped copyright
string encoded as ISO Latin-1 in Adobe’s fonts was a
mistake.
I think the
shared belief at Adobe was that the CFF stings should be
ASCII-only, with escaped parantheses and octal-escaped
higher ISO Latin-1. No Unicode.
Though I imagine that other types of escapement
might work, too. It could be UTF-8 with each byte
octal-escaped. Or it could (?) be just declared that direct
UTF-8 with parantheses escapement is fine.
PostScript string escaping
conventions:
http://www.tailrecursive.org/postscript/escapes.html
A.
Sent from my mobile phone.
>
On 31.01.2016, at 03:45, Hin-Tak Leung <address@hidden>
wrote:
>
> (removed a
few of the SVG related Cc's from previous)
>
> Here is another
comment for discussion and possible inclusion to addendum to
the opentype spec about CFF, after testing the new CFF
processing machinery inside the Microsoft Font Validator
against about 1600 CFF opentype fonts in my hard disk - all
of fedora linux + mac os 10.9 + win 7 plus misc stash.
>
> I found 5 bunch of
fonts using non-ascii in the String INDEX. Since the CFF
specs predates unicode, assumption about non-ascii being
interpreted in utf8 encoding seems presumptuous; also seeing
as this is part of the postscript technology, non-ascii
string should be encoded the postscript way I.e. <hex>
.
>
> Anyway, the 5
are Adobe's Arno* fonts, mozilla's fira* fonts, and
two groups from TexLive, and a 5th. The non-ascii are used
only for 'copyright' and 'notice' part of
the Top DICT on closer look. Adobe uses Latin1 encoding,
while the others use utf8. I think I 'll add a warning
that other than those two (which obviously should and can
contain anything including Klingon), most other use of
non-ascii (most of the String index seems to be glyph names,
a lot of unixxxxx) in the String INDEX should be postscript
hexstring encoded.
> Any comments?
>
>
--------------------------------------------
> On Tue, 26/1/16, Hin-Tak Leung <address@hidden>
wrote:
>
> ... what
next - from the newest and latest SVG table, I have
> turned to having a look at the oldest
unsupported one - CFF.
> Microsoft did
not implement any CFF checking at all, mostly
> I guess due to their (past?) limitation of
the MS renderer's
> capability. Since
we gained a Freetype-based backend in
>
autumn, just before it went MIT, that limitation is no
> longer the case. I have already added CFF
table processing
> to extract the
Postscript dictionaries and data structures,
> and there is also the beginning of a new
tool called
> "CFFInfo" - a
complete lack of imagination in naming - which
> allows a power user (currently that means
just me...) to
> manually examines the
Postscript dictionaries and data
>
structures in the CFF table of an open type font, just
like
> the DSIGInfo and SVGInfo tools.
When CFFInfo matures, I'll
> push it
out. Actually checking Postscript dictionaries
> within the font validator, in an automated
manner, seems a
> rather large and
daunting task. Obviously Adobe would be an
> interesting party to approach to see if
they can commission
> the work, so please
forward if appropriate.
>
>
>
_______________________________________________
> Freetype-devel mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/freetype-devel