[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH]console:UTF-8 mode compatibility fixes

From: Adam Tla/lka
Subject: Re: [PATCH]console:UTF-8 mode compatibility fixes
Date: Sun, 19 Feb 2006 12:47:36 +0100
User-agent: Mutt/1.4.1i

On Sat, Feb 18, 2006 at 08:53:38PM -0500, Thomas Dickey wrote:
> More to the point: since it's been in this form for several years, it 
> doesn't do much good to developers, because there are already workarounds 
> in ncurses to accommodate this, and even if you fixed it today, it would
> be needed in ncurses for a few more years.
> For example (man ncurses):
> ...
>             for Linux and screen.
> is the most recent refinement (from a year ago) to a workaround which 
> first appeared in ncurses 5.4 (originally from December 2002).

OK but my fix is for all not only curses programs. Anyway from my point of view
programs should be written in a way so they are easy to use and work correctly
without user special intervention and knowledge. So ncurses hack is not
a desired way IMHO. Generally saying products should be designed with customers
comfort and not developers/producers comfort in mind. But this is just a wish
of course in current world and off topic.

> >This disagreement has to be solved somehow.
> yes.  ncurses has no better information for this than the result from
> wcwidth().  Shall we add another kludge to accommodate Linux console?
> (Are there other terminal emulators with this specific problem?)

If ncurses know that this is a two-columns wide character so for compatibility
and correct handling reasons kernel console driver should know it too
and send two replacement glyphs or better one replacement glyph and one space 
or there should be n column replacement glyph implemented
(maybe as a replacement glyph and n-1 additional spaces) as for example putty
xterm-color terminal emulator does.

For correct selecting and pasting from console in UTF-8 mode without
data corruption in case of non displayed glyphs and malformed sequences
console screen contents should be memorized as UCS-2 wide chars plus additional
attributes and color information.

Also there should be some specification - maybe RFC - how correctly handle
UTF-8 malformed seqences so no information is corrupt during
displaying/cut/paste/edit operations. If user edits then it should be changed
but if there is no action there should be no change.
So automatic recoding or not displaying something leads to state when I could
have inproper file name and just can't correct it from command line because
I can't input inproper sequence from keyboard - just like in MS Windows -
or I must use some additional software or graphical tool.

Additionally a replacement glyph means that a valid glyph can't be displayed
by a device which doesn't mean that the particular UTF-8 sequence is incorrect.
Of course an inproper byte can't be displayed too because we don't know
what it really means. Result is the same but reasons are different.
So showed on vt screen results hide real reasons which is bad IMHO.
But only the replacement glyph is defined in the standard and there is no bad
byte indicator which is really needed ;-(.

Adam Tlałka      mailto:address@hidden    ^v^ ^v^ ^v^
System  & Network Administration Group           ~~~~~~
Computer Center,  Gdańsk University of Technology, Poland
PGP public key:   finger address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]