[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev About that one character... (was: Superscripts)

From: Klaus Weide
Subject: lynx-dev About that one character... (was: Superscripts)
Date: Wed, 7 Jun 2000 12:14:12 -0500 (CDT)

On Tue, 6 Jun 2000, Philip Webb wrote:

> 000606 Klaus Weide wrote:
> PW> presumably `\305' is the code for `u^', as i vulgarly transliterate it

Actually, the '\305' is not a character by itself; or at least it's not
meant to be one.  UTF-8 is a multi-byte encoding, in it \305 (the byte
with value 0xC5) is only the first half of the encoding of the Unicode


> KW> To see lynx's best effort at rendering such characters, you should have
> KW> saved his text to a file,
> which, of course, i did ... (wry smile)
> KW> and then should have told lynx about the character encoding;
> ah, that bit's new: is UTF-8 Esperanto?

UTF-8 is a character encoding ("charset"), Esperanto is a language.
They are different kinds of things.

You can write (or encode) Esperanto in UTF-8, but then the same is
true for most languages (or, more exactly, scripts): that they can be
encoded in UTF-8.  I believe all the accented characters used for
Esperanto are also available in ISO-8859-3 (ISO Latin 3).

> KW>  lynx -assume_charset=utf-8 /whereever/saved-snippet.html
> that reveals the character as a simple `u', ie it omits the caret OWIC.

As I said, "lynx's best effort".  best == in relation to the display
character set.  If I view this, with lynx, in an either UTF-8 or
ISO-8859-3 environment (with "display character set" set accordingly),
I can see the u-with-the-thing-above-it as one character.

If you want to get a good idea whether lynx recognizes document
characters correctly, without having an environment where you can
display the characters themselves, try one of the "RFC 1345" Display
character sets near the end of the 'O'ptions menu popup list).
You should see something like "u(" or "&u(" for that character,
which isn't very readable but much less ambiguous than just "u".

> KW> or just view the archived message at
> KW>  <>.
> no, that doesn't work, as it shows SP's rendered version.

??? There is one occurrence of the word with the character in question
in that archived message.  The message had snippets of HTML source
and rendering result, both encoded in UTF-8.  The word appears (strangely,
only) in the source snippet.

(Again as a reminder - you have to tell lynx that it should assume UTF-8,
for example with -assume_charset or in the 'O'ptions menu.)

> the correct UTF-8 character won't appear here for me,
> as i don't have it in my XT accessed via Kermit 315 .
> i have long wondered exactly how you get alternative charsets with DOS,
> the explanations on offer seeming to omit vital bits of info.
> French/German accents are ok, as they're in normal ASCII.

I think that in your situation the only way to see that
LATIN SMALL LETTER U WITH BREVE as one character would be to
use an ISO-8859-3 (Latin 3) code page.  The .bat file etc. that
Doug sent seems to have support for it.  But I don't think you
want to use this code page permanently on your PC, unless you
are really getting into Esperanto (or perhaps Maltese, the only
other language afaik for which the Latin-3 repertoire is required -
   Linkname: Latin3

To find out in which Display character sets lynx can show this
character, grep for its Unicode value in the source table files,
like this
    fgrep -i U+016D src/chrtrans/*.tbl

You'll find src/chrtrans/iso03_uni.tbl as the only file where it
is mapped to a non-ASCII code position (0xFD):

   src/chrtrans/iso03_uni.tbl:232:0xFD     U+016D  #       LATIN SMALL LETTER U 

It also appears in src/chrtrans/def7_uni.tbl (from which other
display character sets inherit by default), but mapped to 0x75
which is just an ASCII small 'u', and in some other files for
the RFC 1345 replacements mentioned above.


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]