[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A fix for bad man pages display on UTF-8 locales (patch for groff)
From: |
Werner LEMBERG |
Subject: |
Re: A fix for bad man pages display on UTF-8 locales (patch for groff) |
Date: |
Sat, 19 Jul 2003 10:30:35 +0200 (CEST) |
> I've found a bug in Groff 1.18's UTF 8 device definitions. If that's
> already known, please forgive me. I couldn't locate any info in the
> mailing list archive.
It is not a bug. It is a well known `feature'.
> I'm using Mandrake Linux 9.1, there's an installer option "Use
> Unicode by default".
Your problem is related to Mandrake. For example, SuSE has a
workaround in recent distributions (see below).
> However, there is a problem with some UTF-8 locales with regards to
> man pages. The problem exhibits itself in hyphens (e.g. in the
> option names) being displayed incorrectly and being unsearchable
> (the "minus" character from the keyboard doesn't match them).
`Unsearchable' is the right word.
> This is due to the fact that the groff utility that's used for
> formatting pages (when called from the nroff shell script) formats
> "\-" sequence in the source input as Unicode character "0x2212", and
> "-" character as Unicode character "0x2010" instead of the
> backward-compatible minus sign (which has code "0x002D" for
> compatibility with ASCII).
This is intentional, and I won't change it. From the Unicode point of
view my implementation is correct. The very problem is that most
software doesn't support proper Unicode searching, that is, if you
enter a `-' on the keyboard, it should also find U+2212 and U+2010
(and some other characters too).
> The hyphen sign "0x2212" isn't handled properly by either the less
> viewer, or the output terminal and as a result it's displayed with a
> leading garbage character and can't be input from the keyboard when
> searching in the manual page (so that e.g. it isn't possible to
> search for "-h" option when reading the manual for ls).
Hmm, I've called xterm with
LANG=en_US.UTF-8 \
xterm -fn "-misc-fixed-medium-r-normal--20-200-75-75-c-100-iso10646-1" -u8
(I'm still using xterm from XFree86 4.2.0), and inside this xterm I
did
man groff_man
and both the minus and hyphen are displayed correctly. I have the
following environment settings:
LESS="-MM -S -R"
LESSBINFMT="*n%c"
LESSCHARDEF=8bcccbcc18b.
LESSKEY=/etc/lesskey.bin
So it seems to be a misconfiguration on your side.
> The problem is solved by modifying groff's font descriptions for the
> utf8 device so that the standard, ASCII-compatible "0x002D"
> character code is used instead of "0x2212" for the hyphen sequence
> ("\-").
As mentioned above, this is only a temporary workaround until other
software really supports Unicode.
In SuSE, the following code has been added to the troffrc
configuration file:
.if '\*[.T]'utf8' \{\
. char \- \N'45'
. char - \N'45'
. char ' \N'39'
.\}
which is currently the best solution.
Werner