help-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-texinfo] Umlaut strangeness


From: Karl Berry
Subject: Re: [help-texinfo] Umlaut strangeness
Date: Thu, 2 Jun 2016 20:56:39 GMT

    +      \passthroughcharstrue

Seems to me it will only work if the original character is Latin 1,
or more precisely, PDFDocEncoding.  Otherwise results will be unexpected.

    The problem is the TeX fonts - they don't use ISO-8859-1. 

The encoding of the TeX fonts have nothing much to do with it, as far as
I can see.  Some kind of encoding conversion is needed no matter what,
because the bookmarks are unrelated to the rest of the document.

    @"U. That in turn becomes U when the PDF outline is generated.

Yes, that is the crucial thing.  The "reduction" of 8-bit chars to their
"sorted ASCII" (\indexnofonts) was the easiest reasonable compromise
without a tremendous amount of work to do a full encoding conversion.
Hence my comment years and years ago starting at line 1418,
"PDF outlines are displayed ...".

    If it's possible to use UTF16-BE for the outlines, 

It is possible.  The idea is to output the bookmark strings in PDF
Unicode string syntax, and starting with an FEFF marker.  For example,
this is "A" (U+0041): <FEFF0041>

See the PDF reference manual or, in terse form, the pdfmark manual.
(I've just been reading in this area, which is why I chimed in.)

    as long as the document used UTF-8 as its encoding. 

In practice I suppose it would work out since texinfo.tex would know, in
some sense, the Unicode code point.  

    that case I imagine the encoding could be converted automatically,

There would be weird cases for the Unicode characters above 2^16, where
the UTF-16 encoding is not just the code point, but I suppose they're
unlikely to arise in practice.

    without the need for any large translation tables.

Yeah.  In theory the document encoding should not matter at all, but a
full encoding-to-encoding conversion in texinfo.tex does not sound
feasible to me.  That's why I never did it.

I seem to recall reading that PDFDocEncoding is a superset of Latin 1,
but I haven't done the diff.  If so, @documentencoding ISO-8859-1 could
use the \passthroughcharstrue hack, I think ...-karl



reply via email to

[Prev in Thread] Current Thread [Next in Thread]