texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] character encoding


From: Felix Breuer
Subject: Re: [Texmacs-dev] character encoding
Date: 28 Jan 2003 20:02:22 +0100

Hello!

On Tue, 2003-01-28 at 13:09, Joris van der Hoeven wrote:
> 1) Inside the TeXmacs code, you should (almost) never use '.'
>    for data access, but rather '->'. The reason is that '->' can be
>    customized while '.' cannot. Therefore, you should place methods
>    like 'add_child', 'set_label', etc. in the representation class.
> 
> 2) Converters should be resources (like dictionaries).
>    This is the standard mechanism for guaranteeing that the conversion
>    tables will be loaded only once.
> 
> 3) Maybe you should replace 'tree' by 'ht' when using it as a variable
>    in order to avoid confusion with the 'tree' class.
> 
> 4) David should create a new macro CONCRETE_TEMPLATE_2_VAR or
>    CONCRETE_NULL_TEMPLATE_2_VAR for dealing with the required
>    memory management. This is useful in order to keep such
>    non trivial routines centralized.
> 
> I propose the following working scheme: I first release TeXmacs 1.0.1.2,
> Felix takes care of 1, 2 and 3 and writes a patch which will be integrated
> in version 1.0.1.3. Then David takes care of 4 and the corresponding patch
> will be integrated in version 1.0.1.4.

I took care of 1-3, to everyone's satisfaction I hope. I also fixed the
delete[] char* problem. (The patch is on savannah.)

> I hope that <iconv.h> is sufficiently standard so that
> we do not get any compilation problems on exotic platforms.
> Otherwise, the configure.in will need to be modified.
> We might also automatically generate tables using <iconv.h>
> and include them into TeXmacs. This latter strategy would allow
> us to perform operations on the dictionaries as a whole.

That might indeed be more portable. I would rather try using iconv
directly, though, to avoid the work of generating all those tables.
Let's see whether iconv causes trouble or not.

Right now we use one big table mapping cork/tm->unicode to generate two
dictionaries: cork->utf8 and utf8->cork. Not that utf8 and unicode are
not the same thing! This is made possible by the numerous arguments that
hashtree_from_dictionary takes. 
If we choose to add many different dictionaries it might make sense to
take a "one file <=> one dictionary" approach. I.e. we should generate
two files, one mapping cork->utf8 and one utf8->cork. These two files
would differ from the original one in the following way:

The line

("#E4"  "#E4")

would become

("#E4"  "#C3#A4")   or  ("#C3#A4"  "E4")  respecively.

Thus we wouldn't need to tell the parser whether "#E4" is to be
interpreted literally or as a unicode value that has to be converted to
a UTF-8, UTF-7, UTF-16... byte sequence. We do this conversion once
while generating the dictionary, and hashtree_from_dictionary can then
take *every* escape string "literally". 

I am not keen on writing that generator though ;)

> Once more many thanks to Felix,

You are welcome :)

Cheers,
Felix

-- 
Felix Breuer <address@hidden>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]