[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] groff_char(7): Combination of characters vs. single unicode
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character
Tue, 16 Dec 2014 04:33:55 +0100 (CET)
> Do i understand correctly that the Info manual calls u2260 invalid
> as a glyph name, but that, all the same, \[u2260] produces the
> desired output?
> And that groff contains a table to decompose u2260 into u003D_0338,
> but that, all the same, \[u003D_0338] will give you U+2260 in the
> output stream? If so, what's the point in decomposing?
> If that is correct so far: Given that groff does not produce
> normalization form D in its output stream, why did you choose to use
> it for the documentation? Wouldn't it be easier to understand if
> the normalization form used in the documentation matched the
> normalization form actually produced in the output stream?
Similar to TeX, the distinction between characters, entities, and
glyph names is unclear, unfortunately.
Here's the algorithm for converting an entity E (this is, the value in
the \[...] construct) to groff glyph name G.
1. Compare E with the GGL (Groff Glyph List). The GGL data is
defined in `src/libs/libgroff/glyphuni.cpp' and listed in the
`Input' column of `groff_char.man'.
E1 = GGL(E)
E1 = E
2. Decompose E1 to get Unicode normalization form D. The
decomposition data is defined in `src/libs/libgroff/uniuni.cpp'
and listed in the `Unicode' columns of `groff_char.man'.
G = decomposition(E1)
G = E1
And here the algorithm how groff converts a groff glyph name G to an
output device's glyph name D (or glyph/char index, depending on the
device), to be found in the `Output' column of `groff_char.man'.
a. Check whether G is present in the font. Use it if available.
b. Otherwise, try to map G to a `classical' groff glyph name. This
mapping is defined in `src/libs/libgroff/uniglyph.cpp'.
D = classical_glyph_name(G)
D = G
So if you enter \[!=], groff converts `!=' to `u2260' (step 1), then
to `u003D_0338' (step 2).
For the `utf8' output device, `u003D_0338' is found in
`font/devutf8/R' (step a), returning character code U+2260 as the
For the `ps' output device, `u003D_0338' is not found, thus it gets
converted back to `!=' (step b), which is eventually found in file
`font/devps/S', returning PostScript glyph name `notequal'.
I hope this helps. Patches to improve the docs are really welcome :-)
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character, Ted Harding, 2014/12/15
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character, Carsten Kunze, 2014/12/15