[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LANG env var
Re: LANG env var
Wed, 15 Feb 2006 21:02:10 -0500
Thanks for the information. I have several basic questions I hope you
can answer for me. I don't understand the underlying reason why ncurses
or terminal applications care about UTF-8. I know this isn't your
responsibility to explain to me, but if you understand it, I would love
to hear the reason.
I can only guess 2 reasons why ncurses would care about UTF-8. Hopefully
you could tell me if either case is true or maybe tell me what other
reasons ncurses cares about UTF-8 and how that effects the application's
I can only suspect that terminal emulators are sending UTF-8 style
characters to the program running in them. So, the keyboard input
function needs to parse the characters and understand they will not be
in ASCII format. Is this correct? If this is correct, does ncurses
Next, I'm assuming that ncurses has to display characters on the
terminal in UTF-8 mode. That way, it could display languages other than
English. CGDB reads source code files and displays them using the
ncurses waddch function. Does this function UTF-8? or does it handle
Thanks for helping, I really appreciate it.
On Wed, Feb 15, 2006 at 06:17:56PM -0500, Thomas Dickey wrote:
> On Wed, 15 Feb 2006, Bob Rossi wrote:
> >On Wed, Feb 15, 2006 at 01:06:47PM -0500, Thomas Dickey wrote:
> >>On Wed, 15 Feb 2006, Bob Rossi wrote:
> >>>Does anyone know if ncurses looks at and uses the LANG environment
> >>yes/no: ncurses expects that the application has called setlocale()
> >>to use the desired locale. That function takes into account $LANG
> >>and related variables.
> >OK, I defiantly have not called setlocale (). What is effected if I do
> >not do that?
> It certainly won't recognize UTF-8 - not in 5.5. In 5.4 I was still
> using environment variables to fill in for some lacking functionality
> of the runtime library. But someone pointed out that it had caught up
> and I switched to using nl_langinfo() to determine whether UTF-8 applied.
> If the locale's not initialized, then ncurses will try to appease some
> applications that treat codes 128-255 as "printable". If the locale is
> initialized, it assumes that information is correct.
OK, so should I be telling ncurses what local is being used? Does
ncurses act differently on display data to the terminal if 128-255 is
> >In particular, I would love it if you could explain how LANG effects an
> >ncurses program.
> It controls the interpretation of whether single bytes with specific
> values are printable, whether they can be combined into multibyte
> characters (such as UTF-8 encoding).
OK, I understand that UTF-8 is a variable length byte format, depending
on the contents of the data. Does this effect how ncurses reads input
and what char's it writes to the terminal? For instance, what would a
simple example be?
> >keyboard input function to capture keys. I look at both the termcap and
> >the terminfo database to determine what the special key mappings are. I
> >did this because I need to pass on the raw data to readline in certain
> >circumstances. Does ncurses provide the application level with the raw
> >data that made up a particular key?
> It can (if keypad() is called with a FALSE). But that makes reading the
> keyboard a lot more work.
OK, so, currently CGDB reads from stdin, and tries to map the keys it
get's to function keys or special keys by reading the termcap and
terminfo database. Once it determine if it found something, it also has
the raw data that made up what it just read (a function key or a single
key). CGDB determines if the key is for itself, or if it should send
the data to readline. Obviously, readline wants the raw data so that it
can also parse it and figure out what's going on. Can I get this
functionality out of ncurses? This would save me from maintaining some
potentially complicated code.
> >And again, is any of this effected by the LANG environment variable?
> Sure - the keyboard "should" be able to send multibyte UTF-8 codes
> as well as escape sequences for function-keys. In dialog, I'm using
> wget_wch() to read the keyboard and keep track of whether the data
> is really function key or some code that happens to be in the range
> 256-1000 or whatever.
OK, I understand. xterm would receive UTF-8 from somewhere, and it would
forward that to the application running in it. You are saying that this
data needs to be understand by the application.
Wow, this seems complicated. Is it guaranteed that UTF-8 is the input to
programs in a terminal? or could it be ASCII only? or could it be
Thanks for being patient. I'm just trying to determine what I need to do
to have CGDB work with UTF-8.