bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6252: Fwd: bug#6252: Emacs does not implement URL (aka "percent") de


From: José A . Romero L .
Subject: bug#6252: Fwd: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly.
Date: Tue, 25 May 2010 10:56:36 +0200

(sorry, forgot to fwd this to the bugtrack)
---------- Forwarded message ----------
From: José A. Romero L. <escherdragon@gmail.com>
Date: 2010/5/24
Subject: Re: bug#6252: Emacs does not implement URL (aka "percent")
decoding correctly.
To: YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp>


2010/5/24 YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp>:
>>>>>> On Sun, 23 May 2010 01:46:54 +0200, José A. Romero L. 
>>>>>> <escherdragon@gmail.com> said:
(...)
> If you are referring to the following part of RFC 3986, it doesn't say
> anything about existing URI schemes (as opposed to "a new URI
> scheme"), those defining a component that does NOT represent textual
> data, or even for textual data, those NOT consisting of characters
> from the Universal Character Sets.

You are right. The standard *doesn't say anything* about existing URI
schemes on that matter. Thus  the question would be rather whether to
make the language more or less useful, especially on the light of the
fragment you've just quoted:

     >  When a new URI scheme defines a component that represents textual
     >  data consisting of characters from the Universal Character Set
     >  [UCS], the data should first be encoded as octets according to the
     >  UTF-8 character encoding [STD63]; then only those octets that do not
     >  correspond to characters in the unreserved set should be percent-
     >  encoded.

and the example that immediately follows:

   (...) For example, the character A would be represented as "A",
   the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
   as "%C3%80", and the character KATAKANA LETTER A would be represented
   as "%E3%82%A2".

>
> (See also http://lists.gnu.org/archive/html/emacs-devel/2006-08/msg00065.html)
>
> Though returning a multibyte string decoded as UTF-8 would be useful
> for many cases, I think some "unhex"ing function should also provide a
> functionality to return a unibyte string.
(...)

That's perfectly valid. OTOH some other "unhex"-ing function (or even
the same) could also provide the functionality to return a multi-byte
string, and even allow to  choose the character encoding (UCS or not)
for the resulting string. After  all, don't you think there should be
a better way to decode a Katakana A than using a kludge like this?:

 (decode-coding-string
    (apply 'unibyte-string
           (string-to-list
            (url-unhex-string "%E3%82%A2")))
    'utf-8)

Cheers,
--
José A. Romero L.
escherdragon@gmail.com
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]