bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A fix for bad man pages display on UTF-8 locales (patch for groff)


From: Werner LEMBERG
Subject: Re: A fix for bad man pages display on UTF-8 locales (patch for groff)
Date: Sat, 19 Jul 2003 10:30:35 +0200 (CEST)

> I've found a bug in Groff 1.18's UTF 8 device definitions. If that's
> already known, please forgive me.  I couldn't locate any info in the
> mailing list archive.

It is not a bug.  It is a well known `feature'.

> I'm using Mandrake Linux 9.1, there's an installer option "Use
> Unicode by default".

Your problem is related to Mandrake.  For example, SuSE has a
workaround in recent distributions (see below).

> However, there is a problem with some UTF-8 locales with regards to
> man pages.  The problem exhibits itself in hyphens (e.g. in the
> option names) being displayed incorrectly and being unsearchable
> (the "minus" character from the keyboard doesn't match them).

`Unsearchable' is the right word.

> This is due to the fact that the groff utility that's used for
> formatting pages (when called from the nroff shell script) formats
> "\-" sequence in the source input as Unicode character "0x2212", and
> "-" character as Unicode character "0x2010" instead of the
> backward-compatible minus sign (which has code "0x002D" for
> compatibility with ASCII).

This is intentional, and I won't change it.  From the Unicode point of
view my implementation is correct.  The very problem is that most
software doesn't support proper Unicode searching, that is, if you
enter a `-' on the keyboard, it should also find U+2212 and U+2010
(and some other characters too).

> The hyphen sign "0x2212" isn't handled properly by either the less
> viewer, or the output terminal and as a result it's displayed with a
> leading garbage character and can't be input from the keyboard when
> searching in the manual page (so that e.g. it isn't possible to
> search for "-h" option when reading the manual for ls).

Hmm, I've called xterm with

  LANG=en_US.UTF-8 \
  xterm -fn "-misc-fixed-medium-r-normal--20-200-75-75-c-100-iso10646-1" -u8

(I'm still using xterm from XFree86 4.2.0), and inside this xterm I
did

  man groff_man

and both the minus and hyphen are displayed correctly.  I have the
following environment settings:

  LESS="-MM -S -R"
  LESSBINFMT="*n%c"
  LESSCHARDEF=8bcccbcc18b.
  LESSKEY=/etc/lesskey.bin

So it seems to be a misconfiguration on your side.

> The problem is solved by modifying groff's font descriptions for the
> utf8 device so that the standard, ASCII-compatible "0x002D"
> character code is used instead of "0x2212" for the hyphen sequence
> ("\-").

As mentioned above, this is only a temporary workaround until other
software really supports Unicode.

In SuSE, the following code has been added to the troffrc
configuration file:

  .if '\*[.T]'utf8' \{\
  .  char \- \N'45'
  .  char  - \N'45'
  .  char  ' \N'39'
  .\}

which is currently the best solution.


    Werner




reply via email to

[Prev in Thread] Current Thread [Next in Thread]