groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] groff/man and Eastern European languages?


From: Michael(tm) Smith
Subject: [Groff] groff/man and Eastern European languages?
Date: Wed, 27 Feb 2008 18:05:09 +0900
User-agent: Mutt/1.5.14r5351+poontang (2008-01-29 21:06:38+09:00)

Is there any best-practice information on how to use groff with
Eastern European languages/characters? In particular, best
practices for authoring man pages?

I'd especially like to know what guidance there might be on what
to do for code points that don't have named characters/escapes in
groff (ű, ő, ż, ă, ą, ā, ș, ć, č, etc.) -- basically, for Latin-2
I guess (or Unicode Latin extended-A).

Do authors writing content in Eastern European languages generally
use those characters as-is in their groff source, or do they use
escape sequences to do overstrikes to compose them?

The context for this question is that I'd like to know what would
be best for the DocBook manpages stylesheet to generate for those
languages. For Latin-1/Western European languages, the stylesheet
converts the any non-Roman/accented characters to their
corresponding groff named-characters/escapes. And it does the same
for a whole bunch of symbols also (not just letters). So even if a
user has kept the output encoding for the stylesheet at its
default value (UTF-8), in generated man page for most of those
languages, there will generally only be ASCII characters.

But for Eastern European languages, if the user has UTF-8 source
and keeps the output encoding for the stylesheet set at its
default value, any UTF-8 characters in the source that don't have
named characters in groff are passed through as-is.

What happens with that man page after that depends I guess on what
system(s) it ends up on. I know on my Debian system, the
installed man-page files all seem to be encoded in UTF-8, and the
backend for the man command converts those on the fly (I suppose
by calling iconv or something to do it). But I'm not sure if users
on other systems can depend on something like that.

Anyway, any guidance/suggestions on what would be best to have the
stylesheet generating for Eastern European languages would be
appreciated.

  --Mike

-- 
Michael(tm) Smith
http://people.w3.org/mike/
http://sideshowbarker.net/

Attachment: smime.p7s
Description: S/MIME cryptographic signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]