[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV 2.7.1ac-0.87
Re: LYNX-DEV 2.7.1ac-0.87
Thu, 23 Oct 1997 14:10:53 -0500 (CDT)
On Thu, 23 Oct 1997, Leonid Pauzner wrote:
> > On Wed, 22 Oct 1997, Klaus Weide wrote:
> > > Note that the format for chartrans table files has changed slightly.
> > > The default is now to fall back to the "default" table for characters
> > > that cannot be translated otherwise to the display character set.
> Is there anywhere on WWW a stable convention of iso1->7bit approx?
> I need it mostly for simple reading german and scandinavian names
> without my local 8-bit resetting (not from Lynx but e-mail).
The most well-defined, comprehensive, and available convention I know is
RFC 1345, or its successor "mnemonic,ds" from <URL:ftp://dkuug.dk/i18n/>.
Try the last two entries in the list of display character sets.
Those may not make texts with may non-ASCIII characters very readable,
but are nice, as one example, for understanding texts _about_ characters
by displaying them unambiguously - try
which is something like an All You Ever Wanted To Know About Denmark
(as far as computers are concerned).
The Lynx default translation table is also based on that, for many
> I just yesterday tryed to get iso-latin-1 to us-ascii approximation
> from src\chartrans\, and since Klaus upgrade the tables that time
> I found out the minor bugs. I think any unicode number should have
> definitely one projection to certain charset: that should be a test.
> ***** DEF7_UNI.OLD
> ***** DEF7_UNI.TBL
> # My -> u
> ^^ are you sure in number (b5) ?
My mistake, thanks for noticing.
> It is found .tbl sources very dirty for check,
> therefore mistakes are more than possible.
> People claim:
> " Note that the first 128 character codes of any of the ISO 8859
> character sets is always identical to the ASCII character set. "
> Why not to set 0x00-0xff idem for all of them
> and disable (if necessary) x00-x1f in other place?
The Lynx code doesn't normally check the tables for char values in that
range anyway. So omitting them just means slightly less waste of
memory. OTOH if there is a "0x41 U+NNNN" later on in the file, then
there should have been a "0x41 U+0041" before that. I haven't checked
whether that really is still necessary now.
> Why not to remove all those
> # TRADE MARK SIGN:
> 0x60 U+2018 # left single quotation mark
> 0x27 U+2019-U+201b # various single quotation marks
> 0x22 U+201c-U+201f # various double quotation marks
Yes, they can go away now. Having some characters that occur often in the
separate files could make lookup slightly faster, but that is probably not
> # some mapppings of greek letters to latin letters added,
> # just for fun.. -kw
> scince they are already set in def7_uni.tbl as default?
_Some_ of those are mappings that are not found in def7_uni, especially
the ones to 0xnn where 0xnn >= 0x80 should not be in def7_uni. So it's
actually an example how a specific table can override the default
> More serious: sometimes you use
> U+xxx: 8-bit value
> so people who got files not as .zip package but separately via http
> or compile in other environment may have a wrong mapping,
> look at the very end of iso01_uni, cp437_uni and some others.
> (although it may be a format limitation if you seek two-letter equivalent).
Yes, currently it is a format limitation.
> from README.format:
> b) directives:
> start with a keyword which may be abbreviated to one letter (first
> letter must be capitalized), followed by space and a value.
> Currently recognized:
> The name under which this should appear on the O)ptions screen
> In fact, there is no space after one-letter-abbreviation found.
You could and can put SPACE there, it will be skipped.
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.