bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: term, utf-8 and cooked mode, combining characters


From: Niels Möller
Subject: Re: term, utf-8 and cooked mode, combining characters
Date: 18 Sep 2002 16:19:15 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de> writes:

> I have no idea what you are talking about.  It seems to be related to the
> thread, but you have to be much more precise.  What is this Unicode support
> you are talking about, that my second half seems to be missing?

Sorry. I'll try again...

I was writing in the context of Roland's suggestion of moving term
functionality into the console server, where Roland said:

R> The (unfinished) code I have for libtermserver is just taken from
R> term and has essentially the same interfaces, which are all on
R> single characters passed as int. It seems libtermserver interface
R> should instead use a consr char * and length parameter so that
R> multibyte-aware callers like console can give it a single multibyte
R> character at a time.

The unicode support I'm talking about is the ability to take the input
stream and chop it up into units that are passed on to libtermserver
input handling. That is support that is needed either in console or
term, depending on how they work together.

To me it seems easier to perform the following steps:

A1. Chop the unicode stream up into graphemes.
A2. Convert each grapheme into the local encoding, resulting in one or
    more bytes each. (I think you can do this with iconv).
A3. Pass each grapheme to the term input handling (libtermserver),
    using the local encoding.

than

B1. Convert stream into local encoding.
B2. Chop up the stream into graphemes, using rules that depend on
    the local encoding. (I don't think iconv can do this easily).
B3. Pass the graphemes on to the term input handling.

In particular, I'm afraid that to do B2 you either have to support the
rules of a bunch of strange multibyte charsets, or convert the stream
back to unicode, chop it up into units of base char + combining chars,
and then convert it back to the local encoding.

As I understand you, the current code performs B1 in the console, and
B2 and B3 has to be done by term (but aren't yet implemented). But the
work could be divided differently, either all of A1-A3 + term handling
could be done by the console (Roland's suggestion), or perhaps the
console could do A1-A3 and use some new interface to communicate the
stream of graphemes (in local encoding) to term. One could also move
some of the work even further away from term, into the input client.

And I also think the code needed for implementing A1 is or will be
included in the console anyway, for the output matrix.

Regards,
/Niels




reply via email to

[Prev in Thread] Current Thread [Next in Thread]