lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev HTML4.0 and default charset


From: Klaus Weide
Subject: Re: lynx-dev HTML4.0 and default charset
Date: Thu, 4 Mar 1999 06:14:29 -0600 (CST)

Yes, this doesn't have that much to do with the real world...
but then that's the spirit in which this thread started.

On Thu, 4 Mar 1999, David Woolley wrote:

> > 
> >    Unfortunately, some older HTTP/1.0 clients did not deal properly with
> >    an explicit charset parameter. HTTP/1.1 recipients MUST respect the
> >    charset label provided by the sender; and those user agents that have
> >    a provision to "guess" a charset MUST use the charset from the
>                     ^^^^^^^
> 
> I think guess is really a euphemism for assuming one (probably a compile
> time choice) of:
> 
> - Windows character set;
> 
> - the national character set of the user.

Since Microsoft clients have the capability of guessing even the MIME type
(and make bad use of it), perhaps they can also guess charsets by scanning
the byte stream.

As far as I understand, encodings used for Japanese often involve dynamic
guessing (even de facto in Lynx).

> >    content-type field if they support that charset, rather than the
> >    recipient's preference, when initially displaying a document.
> > 
> > The client requirement is clear for the case where there is an explicit
> > charset value in the Content-Type header.  (One could quibble about
> > the exact meaning of "initial[ly] displaying" though.)  There is no
> 
> I think it is fairly clear - the browser must obey the content type, and
> render the page accordingly; if the user then decides that the result
> is the sort of rubbish that results from a particular wrongly declared
> character set, the browser may permit them to select an alternative
> character set in which to re-render the page.

"Fairly" clear, but not completely - if the user overrides, how long will
that be in effect?  Possible answers could be: while the same document is
displayed, while the document is in the "history" (whatever that means for
a specific browser), while the document is cached, for the duration of the
session.

Btw., your formulation "character set in which to render" shows signs of
infection by popular-browser-think. :)  One doesn't render "in" a charset,
I'd prefer to say one renders (possibly "in" a _font_) "assuming" a charset.

> > clear prescription for the case of a missing charset value, But the last
> > sentence implies that user agents (at least some class of them) are
> > allowed to override the default value "ISO-8859-1" defined above.
> > 
> > So it remains unclear just what the default value of "ISO-8859-1" means,
> > and under which circumstances it applies.  One could speculate that, by
> 
> It think the mess in the wording is a result of a total failure in
> the real world to obey the standards; I think they are asking for a
> best effort to use ISO 8859/1 but giving some licence to violate this
> if dealing with known broken pages.  However HTTP 1.1 servers can be
> assumed never to violate and are supposed to only invoke these clauses
> if they are likely to be being accessed by broken HTTP 1.0 clients which
> mis-parse the Content-Type when correctly told the charset.

There is not much freedom given to HTTP 1.1 servers by the formulation.
The only choice given to them applies only in the case when the charset is
ISO-8859-1.  The freedom given to clients isn't that closely coupled to
what servers are allowed to do.

> If the status line says HTTP/1.1, and there is no charset, a HTTP 1.1 browser
> cannot legitimately assume that it is dealing with, say, Ukrainian,

You seem to assume that the response version string is end-to-end, but
it is supposed to be hop-by-hop information only.

> and
> in any case must not make such an assumption for HTTP/1.0 material until
> the user has had a chance to look at the 8859/1 rendering.

In this you are reading more requirement in the RFC2068 than I could find.
In my interpretation, the "until the user has a chance" would only apply
if the HTTP response header had an explicit charset.

Anyway, a client (of whatever version) cannot reliably tell the HTTP version
of the origin server (afaik, last I checked).   And the HTTP 1.1 spec isn't
*that* unrealistic - there are very few requirements where the range of
allowed behaviour depends on the peer's protocol version (and those should
all be hop-to-hop only aspects).

   Klaus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]