[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wide and UTF-8 international characters

From: D. Stimits
Subject: Re: Wide and UTF-8 international characters
Date: Fri, 16 May 2003 17:06:51 -0600
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021018

Thomas Dickey wrote:

On Fri, May 09, 2003 at 09:23:43AM +0000, John Smith wrote:

>Is there a summary of how to use ncurses with international (wide and utf-8)

there are no tutorials that I'm aware of
(a few manpage references don't count).

I've been adding to the test programs in ncurses to demonstrate these
and other functions.  test/view.c and test/ncurses.c for example.

>character sets? I can't figure out the right way to do it. Apparently vim
>does it, so it should be possible.

actually vim doesn't (it uses termcap-level functions to draw text, and
uses the same wide-character/multibyte string functions that ncurses uses
to manipulate the data).

This is something I'm becoming curious about (I have yet to experiment with it in ncurses, this is all theory for me so far). I ran ldd (linux) on vim, and it shows that it links with libncurses, and not libtermcap or other term libs. It must be doing any non-7-bit-ascii character via ncurses (though I haven't looked to see what the non-7-bit-ascii looks like in vim).

In reading a small book (booklet?) on the original curses (not ncurses), it says the upper bit on 8 bit characters is used to mark standout mode. If I am using just a console or or xterm, without ncurses, I can output the full 8 bit characters as described in html 8-bit entities, echoed directly to a console (not with ncurses or any lib), such as "©", and get the copyright symbol that is like a 'c' inside of a circle (it happens that to echo this I echo an uninterpreted 169 decimal, typecast to char). So current terminals, whether console or X11, use the full 8 bits to create their display. If the eighth bit is being used by curses, then the top 128 characters are lost to standout mode ability. On the other hand, if ncurses uses a separate byte (a 16 bits) to store characteristics, while leaving the full 8 bits to display output, then ncurses can display the full 255 character entity set (html entity set) simply by sending the character straight to the terminal. I'm not positive, but this should include the full UTF-8 set, which is only single-byte. Is ncurses storing attribute in a separate byte already? Or is it the way of the old book description, with 7 bits for character, and the last bit for standout mode flagging? If a separate byte is used already, then it would seem that multibyte characters already have the "infrastructure" to be plugged into ncurses. [FYI, it would be rather useful to see an entity substitution ability, like "©" in html]

Pardon my curiosity, lately I've been looking at some non-7-bit ascii clients, but the clients support only 8 bit, not multibyte characters. I created a lightweight XML style data tree storage mechanism that uses XML/html entities to represent characters that cannot be easily entered via a keyboard, and it turned out to be far more flexible/useful than I thought at first. I remember seeing some of the development ncurses branch as partial or initial support for the wide characters, and I wonder if separation of attributes (like the 8th bit in traditional/old curses for standout) has been part of this preparation?

D. Stimits, stimits AT attbi DOT com

reply via email to

[Prev in Thread] Current Thread [Next in Thread]