bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: console-ncurses: slashdot redisplay bug


From: Marcus Brinkmann
Subject: Re: console-ncurses: slashdot redisplay bug
Date: Sun, 1 Sep 2002 20:15:52 +0200
User-agent: Mutt/1.4i

On Sat, Aug 31, 2002 at 09:08:34PM -0400, David Walter wrote:
> I think that I found the problem  with redisplay, but  I'm not sure of
> what component owns this problem, here is what I have found so far.

Yes, you found it.  But it is a 
 
> A minor hack allows this page to display correctly.
> 
> At or around line 441 in console-ncurses.c
> if( ucs4_to_altchar
> else
> {
>         if (str->chr == 183)
>           wch[0] = ' ';
>         else
>           wch[0] = str->chr;
> 
> This allows the refreshing and scrolling to work correctly.

The thing that happens here is the following:
lynx displays a ·, not a bullet, so it does not use the curses
ACS_BULLET feature, but printing the character 0xb7, which technically
is not a bullet (UNICODE_BULLET) but a middle dot (UNICODE_MIDDLE_DOT).
So it does the right thing.

This means that for the console server, this is _NOT_ a character of the
alternate character set, but a normal unicode character 0xb7, and stores
that in the screen matrix.

The ncurses display client intimately knows about the unicode characters
used to represent characters from the alternate character set.  It knows that
the console server would have stored a UNICODE_BULLET in the screen matrix
if the console server had seen the ACS_BULLET alternate character. 
UNICODE_BULLET is 0x2022, and all 0x2022 characters that are seen by the
ncursesw client are printed using ACS_BULLET.  However, in this case, it was
not 0x2022 printed, but 0xb7.  0xb7 is not treated as a special character by
the ncursesw client, so it is treated as a normal character, like any other
unicode character.

If you try other non-7bit-ASCII characters like umlauts, you will see other
display bugs, too.  It's not only the middle dot used in the slashdot pages. 
So we have to solve the generic problem how to display unicode characters in
the ncursesw console client.

> I've tried searching the terminfo and ncurses information, but I don't
> find a specific capability that handles this. 

If you use a unicode console (eg, UTF-8), then I think the console-ncurses
client will work correctly.  Originally I thought that ncursesw does the
character set conversion according to the current locales, but it seems it
doesn't.  Which is a pity.
 
> As I was looking for the terminfo capability that would describe this,
> I noticed that the  ACS_* macros are mapping to   +-|= if I  display a
> dialog window, yet since any terminal types I map have the same result
> I assume that ncurses has  made the decision not  to display the ascii
> graphic because of it's  understanding of the terminals  capabilities,
> but  which I  don't know,   where to  find   to  change, the  terminal
> characteristics as defined in hurd.ti AFAICT  are mapped correctly, as
> in stty says cs8 -istrip and any  others that I can  tell need to work
> for 8 bit characters.

I am a bit confused by this paragraph.  Here is how it works.  dialog
running on a virtual console of a console server should see TERM=hurd and
print +-|= etc as the line graphic characters.  Then the console server maps
them to the resp. unicode charactes (altchar_to_ucs4 in console/display.c).
Those unicode characters are then converted back to ACS_* ncurses altchars
by the console-ncursesw client in ucs4_to_altchar.

> So the long and the short of this looks like  to me, that the terminal
> capability for  the character mapping needs  to include the upper half
> of 8 bit characters.

No, that is the wrong lesson learned, because the alternate character stuff
is a very ancient attempt at breaking out of 7bit ASCII, which is obsolete
today.  Of course you should still use the ACS_* stuff in ncurses
application for compatibility, but extending this hack to cover latin1
supplement characters is completely wrong.

What we have to do is the following: If the underlying terminal of the
ncurses client can display UTF-8 characters natively, we can use the current
code.  If the underlying console has a different native character set, we
must not use the ncursesw interface, but the normal ncurses interface, and
convert the unicode characters to the local encoding.  This is slow, and
will probably not work for mulitbyte encodings.  Of course it would be nicer
if ncursesw would handle this transparently.

Thanks,
Marcus


-- 
`Rhubarb is no Egyptian god.' GNU      http://www.gnu.org    marcus@gnu.org
Marcus Brinkmann              The Hurd http://www.gnu.org/software/hurd/
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]