texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Thoughts about encodings


From: Norbert Nemec
Subject: Re: [Texmacs-dev] Thoughts about encodings
Date: Tue, 29 May 2007 00:10:54 +0200
User-agent: Thunderbird 2.0.0.0 (X11/20070326)

Joris van der Hoeven wrote:
> On Mon, May 28, 2007 at 02:09:49PM +0200, Norbert Nemec wrote:
>   
>> Once we have eradicated Cork T1 that way, it should also be much easier
>> to introduce UTF-8 internally, which should be reasonably close to a
>> 1-to-1 mapping on universal symbols.
>>
>> What do you think?
>>     
>
> Yes, we should move towards UTF-8. As you may have noticed,
> this is a very tricky thing, mainly because of the horrible
> encodings we inherit from TeX/LaTeX.
>
> As a first step, we should make strict use of
> Cork + <univ-char>-style universal extension codes.
> In particular, this means revamping Cyrillic,
> but their may be other places where we are not strict.
>   
Why not go one step further to ASCII+<univ-char> ? If all non-ascii
characters had
a universal symbol representation, the use of the Cork encoding could be
limited to the
font resolver, where it can then be easily changed to other encodings
depending on the font.

If one wants to use UTF-8 internally some time in the future, it would
be a straightforward
extension

> At a second step, we should design an UTF-8 plane for
> all TeXmacs-specific characters. This will allow us
> to have a complete one-to-one correspondance.
>
> At a third step, we have to redesign the font system,
> which is overly complex and in particular get rid of
> all rubber characters. We also should get rid of Metafont and
> should be able to use more easily system provided fonts.
>
> At the last step, we might internally get rid of
> the current character encoding and use UTF-8.
> We should keep the old tables for the support of
> comprehensible names for all symbols.
>   
IMO, the "get rid of the current encoding" part should actually the the
first step.

Restricting the code internally to plain ASCII allows for a fairly
simple verification of the code.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]