lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Lynx character entity references fix


From: Leonid Pauzner
Subject: Re: lynx-dev Lynx character entity references fix
Date: Tue, 9 Mar 1999 23:45:42 +0300 (MSK)

9-Mar-99 12:45 Klaus Weide wrote:
> On Sun, 7 Mar 1999, Leonid Pauzner wrote:

>> > On Fri, 5 Mar 1999, Leonid Pauzner wrote:
>>
>> >> >      * From: Jacob Poon <address@hidden>
>> >> >         - Fixed some typos in the old references. (fixed: b.delta)
>> >> Thanks, I'm now working on old-style entities code, will integrate your 
>> >> fix.
>>
>> BTW, an interesting side effect found:
>> if you look  /test/unicode.html with Lynx dev.19
>> and set "display charset" to x-transparent
>> you got a nice picture:
>> I was not sure whether the chars < 128 would be converted properly (OK),
>> but occusionally Latin1 chars got reverse translated to character entities
>> and the original source was numeric entities!!!
>> See around line 0x0100.
>> This is due to my recent changes, no such things for 2.8.1.
>> Apparently x-transparent should fallback unicodes to 7bit like CJK does
>> but an interesting internal things became visible.

> Do you mean this is good, bad, or just interesting?  Do you want to
I mean this is just interesting: I thought old-style entities
really not used but this example shows otherwise. I haven't looked
more closely why this happen but we have now a definite example to start from.

> leave it this way?  (I think it would be better to restore the
I fix this via "fallback" flag in UCDomap.h so &#123 etc. gets translated
through def7_uni.tbl as we have for CJK.

> use-SevenBitApproximations behavior.)  Can you explain why this is
> happening?

> Also, I haven't seen a patch for the ifdef'd entities.h tables - did I just
> miss it?
This is in my pending patch, I haven't sent it yet (will do).


>   ----

> Among the previous changes (that are in dev.18/dev.19), the following
> looks wrong.  In UC_con_set_trans():

>     for (i = 0; i < UCInfo[UC_charset_in_hndl].num_n256; i++) {
>         if ((j = UCInfo[UC_charset_in_hndl].unicount[i])) {
>             ptrans[i] = *p;
>             for (; j; j--) {
>                 p++;
>             }
>         } else {
>             ptrans[i] = 0xfffd;
>         }
>     }

> Here ptrans points to one of the four tables (slots) in translations[].
> Your change leaves the table unchanged when it should be re-initialized.
> So (to-Unicode translation for) one charset could effectively inherit
> the translations for a completely different charset that used the same slot
> before.

Yes, I was not able to understand why we have four tables
(IMO only one is really used) and what is UC_MapGN for.
So I just "add" num_n256 so things works without index overrun
(and hopefully with a proper result) and postpone more UCDomap.c changes
for dev.Next - patch from your side really welcome :-)

> The closer equivalent to previous behavior would be to initialize all 256
> elements to 0xfffd.

> It *seems* that *currently* this code will never be called for any of the
> charsets with num_n256==0 -- as long as they also have num_uni==0.
> UC_con_set_trans() is only called from UC_MapGN(), and all calls to
> UC_MapGN() are "protected" by a preceding

>     if (!UCInfo[UChndl_in].num_uni)
>         return -11;


>    Klaus



reply via email to

[Prev in Thread] Current Thread [Next in Thread]