lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev chartrans to CJK-like display (was: stopping when viewing a sit


From: Klaus Weide
Subject: lynx-dev chartrans to CJK-like display (was: stopping when viewing a site)
Date: Sat, 21 Aug 1999 20:43:14 -0500 (CDT)

On Sat, 21 Aug 1999, Henry Nelson wrote:

> > My conclusion about "Tansparent" vs 'trydefault' was based on that fact:
> > let we have html page with lots of entities like ¥∧&something;
> > then in transparent mode these entities will have no translation
> > so we saw them verbatim, instead old behaviour (before my changes)
> > was based on using 7-bit approximations. IMO CJK behave the same way.
> > 
> > I mean translation "TO charset" e.g. to x-transparent, to any_CJK yes?
> > (not from ...)
> 
> I'm not sure what you are trying to say, and I can only speak for myself
> (although I suspect most CJK users would have a similar preference).
> What I say is not a criticism nor an endorsement of what you have done
> or plan to do.  It is only my preference for how Lynx should work
> concerning the display character set euc-jp.
> 
> I do not think there is any problem with x-transparent showing entities
> verbatim ("¥" = "¥").  It is pretty nebulous just what "transparent"
> means, anyway, IMO.  

Yes...  I find both choices acceptable.  (We could even make '@' toggle
between the two...)

> CJK are different, however, in that the user expects
> a "normal" display, i.e., the user expects to see _characters_, not code,
> presented on the screen, preferably in the same way as any other program
> displays them.  Therefore, ¥ should produce a yen sign, and ü
> should show up as a 'u' with and umlaut above it.  It makes sense to have
> 7-bit approximations be the default because then it is not machine specific.
> A person will not be overly surprised to see a "u:" where there is a ü,
> whereas "ü" would be much less desirable.

I think we all agree on that.

> I have relied on this behavior in the past to create my own default
> character set, by simply copying it over src/chrtrans/def7_uni.tbl.
> (An example entry is "0x5c U+00a5" to replace "U+00a5:YEN" because
> this gives me a true yen sign on a Japanese Windows machine).  It is
> a kind of trick to get a display character set that matches the
> machine I am running the terminal emulator on.  It is some hybrid
> cp???_uni.tbl.  I suppose I could even contact Microsoft to find
> out what the "official" one is, if I cared enough.

IMO, the more logical way to do this would be the other way around -
instead of modifying def7_uni (which after all stands for "us-ascii",
and should be 7-bit only), start with "Japanese (EUC-JP)" and modify
*that*.  Give this CJK character set a .tbl file, put the additional 
strings in etc. as for other .tbl files, but also add a line 'R 5'
(see comments in utf8_uni.tbl).  Then treat this like other tables
included in UCdomap.c, and remove the special definition in UCdomap.h.

I don't know how lynx would deal with a character set that is CJK *and*
has translation tables - so far this has just not come up.  Changes
will be needed in various places, at least so that lynx doesn't skip
table translation immediately when it sees CJK.  Still I think this
would be the more logical way to add this kind of limited unicode-to-
CJK-display translation.

> So to bring this long story to a close, what would be perfect is to
> have a configurable, non-generic default, or perhaps more simply, to
> "hard code" the default right in the description of the character set
> so that a simple edit of UCdomap.h to replace:
>    dfont_unicount,dfont_unitable,463,dfont_replacedesc,\
> (I am not trying to say this is "right" or "better", quite the opposite,
> it is the only way I can think to do what I'd like to do.) with:
>    dfont_unicount_cp???,dfont_unitable_cp???,224,dfont_replacedesc_cp???,\ 
> would switch from 7 bit approximations to the character set that is
> appropriate for a particular user.

You want your additional strings to apply in one (i.e. your specific)
display character set.  So there isn't really a good reason to make
_the default_ configurable or non-generic IMO - the strings should be
put in a table specific to that display character set.

I have never tried the CJK-with-.tbl-file approach, but would like to
encourage you to start down that path...  We can then begin to look for
the places that need to be modified, for the tables to actually be used.

A somewhat simpler idea: I thing there is nothing hardwired that
_requires_ the 'default' character set to be the same as "us-ascii"/
SevenBitApproximations.  So you could try to move the

   # Shall this become the "default" translation table?  YES!
   # There has to be exactly one table marked as "default".
   D1

from def7_uni.tbl to another one.  That other one could be a copy of
current def7_uni with your additional strings added, but marked as CJK,
and with the MIMEname,OptionName either kept as "euc-jp","Japanese
(EUC-JP)" or changed, to either replace the existing display character
set or be added as an alternative version.  (There are several
combinations to try, keep/change MIMEname and keep/change OptionName,
combined with keeping-or-not the UCdomap.h #define...)

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]