lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev lynx: have bug (fwd)


From: Leonid Pauzner
Subject: Re: lynx-dev lynx: have bug (fwd)
Date: Sun, 21 Mar 1999 18:34:27 +0300 (MSK)

21-Mar-99 06:56 Klaus Weide wrote:
> On Sun, 21 Mar 1999 address@hidden wrote:

>> Forwarded message:
>> > From: address@hidden
>> > Date: Sun, 21 Mar 1999 11:07:58 +0200 (EET)
>> > Message-Id: <address@hidden>
>> > To: address@hidden
>> > X-URL: http://www.slcc.edu/lynx/release2-8-1/
>> >
>> > With lynx 2.8.1 on slackware 3.6, we've seen that the postings to CGI's 
>> > work
>> > wrong. It posts Turkish letters wrong, however older versions can post them
>> > correctly.

Please add more information:
try to compare -trace logs between old version and version 2.8.1
near the point where the problem supposed (sending HTTP request and before).
Because of a length of trace logs please sent us a relevant fragment only.

>> >
>> > In the bug fixes page, there were no such fix (there were fixes only for 
>> > build
>> > errors, load errors,core dumps etc.)
>> >
>> > Is this bug been recognized ?
>> > Is there a patch for it ?
>> >
>> > -Turan Yuksel (address@hidden)

> Please give some more information: an example (of a page with the FORM)
> with URL; and the settings from Options (.lynxrc) and lynx.cfg that relate to
> charsets; and which characters are wrong.

One certain "problem" I personally run into is a utf-8 URL encoding:
when HREF= have *open 8-bit text* the remote server (script)
may (1) expect such bytes %xx-encoded,
but lynx now (2) translate URLs from document charset to utf-8
and then sent each byte %xx-encoded (an obvious check -
a number of %xx encoded bytes increased).

UTF-8 URL-encoding was proposed in several recent drafts
(not handy, but I remember a note that certain protocols
or servers may expect blind %xx encoding, not utf-8
so we may need a configurable option between (1) and (2) for compatibility.
Also I doubt lynx do (2) in all cases, saw it only for HTML's -
a proper solution here may be to not include open 8-bit bytes in HREF=url
but only %xx-encoded by page authors).

At least I18N (RFC2070) describe the problem:


RFC 2070               HTML Internationalization            January 1997


5.2. Form submission

   The HTML 2.0 form submission mechanism, based on the "application/x-
   www-form-urlencoded" media type, is ill-equipped with regard to
   internationalization.  In fact, since URLs are restricted to ASCII
   characters, the mechanism is akward even for ISO-8859-1 text.
   Section 2.2 of [RFC1738] specifies that octets may be encoded using
   the "%HH" notation, but text submitted from a form is composed of
   characters, not octets.  Lacking a specification of a character
   encoding scheme, the "%HH" notation has no well-defined meaning.



> It may not be a bug, but you have to set up lynx correctly.
> Try it with -raw (or the equivalent '@' key toggle), or with
> -assume_charset=iso-8859-9 (you possibly also want
> -assume_local_charset=iso-8859-9).

>    Klaus


More from FRC 2070:

   The best solution is to use the "multipart/form-data" media type
   described in [RFC1867] with the POST method of form submission.  This
   mechanism encapsulates the value part of each name-value pair in a
   body-part of a multipart MIME body that is sent as the HTTP entity;
   each body part can be labeled with an appropriate Content-Type,
   including if necessary a charset parameter that specifies the
   character encoding scheme.  The changes to the DTD necessary to
   support this method of form submission have been incorporated in the
   DTD included in this specification.

   A less satisfactory solution is to add a MIME charset parameter to
   the "application/x-www-form-urlencoded" media type specifier sent
   along with a POST method form submission, with the understanding that
   the URL encoding of [RFC1738] is applied on top of the specified
   character encoding, as a kind of implicit Content-Transfer-Encoding.

   One problem with both solutions above is that current browsers do not
   generally allow for bookmarks to specify the POST method; this should
   be improved.  Conversely, the GET method could be used with the form
   data transmitted in the body instead of in the URL.  Nothing in the
   protocol seems to prevent it, but no implementations appear to exist
   at present.

   How the user agent determines the encoding of the text entered by the
   user is outside the scope of this specification.

      NOTE -- Designers of forms and their handling scripts should be
      aware of an important caveat: when the default value of a field
      (the VALUE attribute) is returned upon form submission (i.e. the
      user did not modify this value), it cannot be guaranteed to be
      transmitted as a sequence of octets identical to that in the
      source document -- only as a possibly different but valid encoding
      of the same sequence of text elements.  This may be true even if
      the encoding of the document containing the form and that used for
      submission are the same.





Yergeau, et. al.            Standards Track                    [Page 17]

RFC 2070               HTML Internationalization            January 1997


      Differences can occur when a sequence of characters can be
      represented by various sequences of octets, and also when a
      composite sequence (a base character plus one or more combining
      diacritics) can be represented by either a different but
      equivalent composite sequence or by a fully precomposed character.
      For instance, the UCS-2 sequence 00EA+0323 (LATIN SMALL LETTER E
      WITH CIRCUMFLEX ACCENT + COMBINING DOT BELOW) may be transformed
      into 1EC7 (LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT AND DOT
      BELOW), into 0065+0302+0323 (LATIN SMALL LETTER E + COMBINING
      CIRCUMFLEX ACCENT + COMBINING DOT BELOW), as well as into other
      equivalent composite sequences.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]