lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev cleanup chartrans [patch]


From: Leonid Pauzner
Subject: Re: lynx-dev cleanup chartrans [patch]
Date: Fri, 26 Feb 1999 00:16:34 +0300 (MSK)

25-Feb-99 07:36 Klaus Weide wrote:
> On Thu, 25 Feb 1999, Leonid Pauzner wrote:
>> 25-Feb-99 04:49 Klaus Weide wrote:

>> > What's special about 8859-15, to be the only one left intact here
>> > besides 8859-1?  "7 Bit Approximations" would much rather deserve that
>> > honor.
>> Well, yes, but this got hidden the another side...

> It would be more logical to put "7 Bit Approximations" in the 2nd place
> though.  That would change the order in the "Display character set" option
> list, but maybe that's not a bad idea anyway.

OK, no problem (next time).
I would probably suggest moving "7 bit approximation" to the first place
(current_char_set = 0) instead. I change lots of such explicit checks
with LATIN1 macro, but no guarantee for all places. This is somehow
close to the logic of ISO_Latin1 usage...


>> >> @@ -394,41 +369,6 @@
>> >>     *  Placeholders for Unicode tables. - FM
>> >>     */
>> >>    {-1,"iso-8859-15",   UCT_ENC_8BIT,0,0,0,     UCT_R_8BIT,UCT_R_ASCII},
>> >> -  {-1,"cp850",         UCT_ENC_8BIT,0,
>> >> -                       UCT_REP_SUPERSETOF_LAT1,
>> >> -                       0,                      UCT_R_8BIT,UCT_R_ASCII},
>> > [ etc - including CJK, 7-bit approx., transparent ]
>>
>> > The various tables here served to provide some minimal information
>> > (without taking much space) about several charsets / Display character
>> > sets even in the case where chartrans table files for them were not
>> > included.  Yes it's redundant; however, sometimes redundancy *may* be
>> > good.
>> Yes, but in this case I think this redundancy may be misguiding for other
>> changes. In fact, no fields from this struct are used except mime name and
>> encoding name, only UCT_REP_* _may_ be useful when we are very close to
>> old-style LATIN1 charset.

> Yes, most of those bits are underused...  probably even more so now.
> I always liked to keep the possibility open to one day do something more
> with that info, or put more detailed info in that struct (like *what kind*
> of CJK encoding, or which scripts of Unicode where present in a charset's
> repertoire).  But it hasn't happened, and leaving it open is not exactly
> compatible with your goal of cleaning up.

"*What kind* of CJK encoding" can be mapped to 'enc' value as a region.
Check for "160" and "173" can be done dynamically (rare), e.g.
  if (160 == UCTransChar(160, from_charset, to_unicode)) {}
Other info can be incorparated into *_uni.h format when necessary.

>> > still the case; maybe it's not wanted.  It probably hasn't been tested
>> > by anyone in a long time.  An example would be the case where someone
>> > wanted to not have the large 7-bit approximations file, but still have
>> > 7-bit approximations available as Display character set to at least
>> > deal with the "classical" ISO-8859-1 chars and entities.

Well, this two old-style tables in LYCharSets.c and corresponding code
may probably be #ifdef'ed with OLD_STYLE_CLASSIC (iso-latin1 for any display)
- will look more closely (and this is a little bit harder than just removing).

>> How about Euro/(TM)/Copyright/emdash/etc requests?

> It's not strictly 8859-1 but with some extensions - &trade, &copy, &emdash
> were "classically" covered,
yes, and no &emdash in HTML 4.0 but only &mdash :)

                            euro is much to new (and isn't listed in
> entities.h even now, as of dev.17).
the table dated to 1997, a superset of HTML 4.0 entities.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]