[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] typesetting Czech with custom fonts

From: Werner LEMBERG
Subject: Re: [Groff] typesetting Czech with custom fonts
Date: Thu, 29 Mar 2012 08:05:17 +0200 (CEST)

> You are correct that full UTF-16 is supported for annotations, the
> problem is that by the time the string is passed to pdfbookmark the
> characters have been changed to named glyph nodes which I believe
> can't be converted back to their UTF-16 character code
> (i.e. \[u0159]) within a macro, [...]

\X allows \[...] if `use_charnames_in_special' is set in the DESC
file.  This might help for gropdf which can then convert such entities
to proper PDF string literals.  BTW, `.device' doesn't has this
restriction, so

  .device \[foo]

gets happily emitted as

  x X \[foo]

even without `use_charnames_in_special'.

> In order to do this I think we'd need help from troff, something
> like .asciify16hex which would return the string as a BOM followed
> by the two byte unicode for each character, i.e. 00 41 01 59 (A
> rcaron)

You mean this hypothetical call

  .asciify16hex A\[u0159]

should return the string



> ... this could then be passed onto the pdf enclosed in '<>' with a
> BOM on the front instead of enclosing the text in '()'.

Why do you need a Byte Order Mark?  Note, however, that you actually
need UTF16-BE encoding for PDF literals, IIRC, so Unicode values
larger than U+FFFF must be represented as surrogate pairs.

> Even being able to reconstitute \[u0159] would be helpful for
> gropdf, since it could then build the hex string itself.

What exactly do you mean with `reconstitute'?

> I've been looking into .asciify in a bit more detail (in preparation
> for the documention patch you asked for).  Please can you confirm
> I've got this correct: [...]

Looks fine.

> My c++ foo is not strong but I suspect the nodes marked as ignored
> (which have no specific asciify method) inherit the generic node
> method which is to return the node.


> It can be seen from the above that in several cases the asciified
> string/diversion will still hold nodes as well as ascii characters.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]