lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV did something happen to <img alt="xxx"> ??


From: Klaus Weide
Subject: Re: LYNX-DEV did something happen to <img alt="xxx"> ??
Date: Mon, 6 Oct 1997 04:58:18 -0500 (CDT)

On Mon, 6 Oct 1997, Nelson Henry Eric wrote:

> > output when you access the file through that URL).  Put a
> > <META HTTP-EQUIV="content-type" CONTENT="text/html;charset=XXXX">
> > in there with the right XXXX, unless it is necessary to omit this in
> 
> The problem I find with using a META for "charset" is that many
> (most?) of my documents are a mixture of SJIS, us-ascii, and iso-
> 2022-jp.

Strictly speaking, as I interpret the HTML Internationalization RFC 2070
and other texts about character sets, it isn't possible to mix character
encodings in one document.  (US-ASCII shouldn't be a problem since it
should be a subset of the others.)  Practically that doesn't matter since
people will do it anyway as you say, but it probably means there isn't one
"right" charset parameter for such mixed documents.

It doesn't matter for Lynx anyway, since with one of the "Japanese"
display character sets selected, '@'/Raw/CJK is used anyway to turn
interpretation of the document as (possibly mixed) CJK-encoding ON
or OFF.  What the charset parameter (if present) doesn't really matter
if CJK is ON. (Except that when a charset parameter is present and
specifies a non-CJK character encoding, CJK interpretation is turned
off for that document.)

[ All with the caveat "unless I have broken it." ]

Any hint which Lynx might derive from an explicit charset parameter, for
its guessing of the encoding of Japanese text, is effectively set out of
force by the following lines in GridText.c (if I understand correctly
what it does):
    if (ch == ' ') {
        text->permissible_split = (int)line->size;      /* Can split here */
        /*
         *  There are some pages witten in
         *  different kanji codes. - TA
         */
        if (HTCJK == JAPANESE)
            text->kcode = NOKANJI;
    }


> In addition, I don't know what the correct mime designation
> is for "SJIS", if there is one.

"charset=shift_jis" is the official and preferred designation, according
to <URL: ftp://ftp..isi.edu/in-notes/iana/assignments/character-sets>.
A look into a recent userdefs.h or lynx.cfg could have told you that this
is the MIME name recognized by Lynx :)

> So it's hard to decide what the
> "right XXXX" is.  Actually, the mixing of character sets is not uncommon
> in Japan.  Another problem is that using a META tag can also make the
> page unreadable by Japanized Netscape (no real loss I guess).

That's your call :)

But are you sure you are really dealing with Shift-JIS?  Your page with
mixed encoding seems to mix EUC-JP and ISO-2022-JP, not Shift_JIS and
ISO-2022-JP.  (The bytes Lynx writes to the screen when "Japanese (EUC)"
is selected are identical with the 8-bit encoded string in question.
That is not the case if "Japanese (SJIS)" is selected.)
Why would you use Shift-JIS anyway?  It's a Microsoft invention, I thought
you were using Unix...

    Klaus


;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]