Re: Wide and UTF-8 international characters

bug-ncurses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wide and UTF-8 international characters

From:	Thomas Dickey
Subject:	Re: Wide and UTF-8 international characters
Date:	Fri, 16 May 2003 20:08:27 -0400
User-agent:	Mutt/1.3.27i

On Fri, May 16, 2003 at 05:06:51PM -0600, D. Stimits wrote:
> Thomas Dickey wrote:
> 
> >On Fri, May 09, 2003 at 09:23:43AM +0000, John Smith wrote:

> >actually vim doesn't (it uses termcap-level functions to draw text, and
> >uses the same wide-character/multibyte string functions that ncurses uses
> >to manipulate the data).
> 
> This is something I'm becoming curious about (I have yet to experiment 
> with it in ncurses, this is all theory for me so far). I ran ldd (linux) 
> on vim, and it shows that it links with libncurses, and not libtermcap 
> or other term libs. It must be doing any non-7-bit-ascii character via 
> ncurses (though I haven't looked to see what the non-7-bit-ascii looks 
> like in vim).

ncurses provides a termcap interface.  But that only gives the data
shown in the terminal description - it doesn't assist in using it
optimally.  ncurses also provides a more-capable terminfo interface,
but vim doesn't use that, afaik (even when its ifdefs say so, it's
still trying to pretend it's termcap).  vile and tin can build against
termcap, terminfo or curses - there are tradeoffs.

short: when I mention that an application doesn't use (n)curses, I'm
generally referring to the interface it uses rather than the library
that provides it.

> In reading a small book (booklet?) on the original curses (not ncurses), 
> it says the upper bit on 8 bit characters is used to mark standout mode. 

That was true through BSD 4.3.  I sometimes test against that (for programs
such as lynx, which can build/run with a variety of curses implementations).

> If I am using just a console or or xterm, without ncurses, I can output 
> the full 8 bit characters as described in html 8-bit entities, echoed 
> directly to a console (not with ncurses or any lib), such as "&copy;", 
> and get the copyright symbol that is like a 'c' inside of a circle (it 
> happens that to echo this I echo an uninterpreted 169 decimal, typecast 
> to char). So current terminals, whether console or X11, use the full 8 

generally true.  But the 8th bit used for standout in BSD curses was
stripped off and used as a flag to tell that implementation whether
to use standout mode to highlight characters.

> bits to create their display. If the eighth bit is being used by curses, 
> then the top 128 characters are lost to standout mode ability. On the 
> other hand, if ncurses uses a separate byte (a 16 bits) to store 

more than 8 bits, actually.

> characteristics, while leaving the full 8 bits to display output, then 
> ncurses can display the full 255 character entity set (html entity set) 
> simply by sending the character straight to the terminal. I'm not 
> positive, but this should include the full UTF-8 set, which is only 
> single-byte. Is ncurses storing attribute in a separate byte already? Or 

the problem with that, is that it doesn't mix well with treating the screen
as an array of characters.  You _could_ store each row as a multibyte string
(with some pain achieved at the right margin), but it would require counting
or some index added to point to a character which starts at a given column.
Instead, the common approach stores multiple characters for each array
position - some storage is wasted, but it's accessed more rapidly.

> is it the way of the old book description, with 7 bits for character, 
> and the last bit for standout mode flagging? If a separate byte is used 
> already, then it would seem that multibyte characters already have the 
> "infrastructure" to be plugged into ncurses. [FYI, it would be rather 
> useful to see an entity substitution ability, like "&copy;" in html]
> 
> Pardon my curiosity, lately I've been looking at some non-7-bit ascii 
> clients, but the clients support only 8 bit, not multibyte characters. I 
> created a lightweight XML style data tree storage mechanism that uses 
> XML/html entities to represent characters that cannot be easily entered 
> via a keyboard, and it turned out to be far more flexible/useful than I 
> thought at first. I remember seeing some of the development ncurses 
> branch as partial or initial support for the wide characters, and I 

that was up til mid-2001 - I didn't quite know where to begin at rewriting,
but one of the contributors got it moving.  ncurses 5.3 was good enough to
use - the current code probably has isolated bugs, but I don't see any
that are related to wide-characters.  Not all functions are tested - so
I've been reviewing, adding test-programs for places that are noticeably
not covered.

> wonder if separation of attributes (like the 8th bit in traditional/old 
> curses for standout) has been part of this preparation?
> 
> D. Stimits, stimits AT attbi DOT com
> 
> 
> 
> _______________________________________________
> Bug-ncurses mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-ncurses

-- 
Thomas E. Dickey <address@hidden>
http://invisible-island.net
ftp://invisible-island.net

[Prev in Thread]

Current Thread

[Next in Thread]

Wide and UTF-8 international characters, John Smith, 2003/05/09
- Re: Wide and UTF-8 international characters, Thomas Dickey, 2003/05/09
  - Re: Wide and UTF-8 international characters, D. Stimits, 2003/05/16
    - Re: Wide and UTF-8 international characters, Thomas Dickey <=
    - Re: Wide and UTF-8 international characters, D. Stimits, 2003/05/17
    - Re: Wide and UTF-8 international characters, Thomas Dickey, 2003/05/17

Prev by Date: Re: Wide and UTF-8 international characters
Next by Date: Re: Wide and UTF-8 international characters
Previous by thread: Re: Wide and UTF-8 international characters
Next by thread: Re: Wide and UTF-8 international characters
Index(es):
- Date
- Thread