bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request


From: Dmitry Gutov
Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
Date: Thu, 11 Aug 2016 05:52:42 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:47.0) Gecko/20100101 Thunderbird/47.0

On 08/10/2016 05:35 PM, Eli Zaretskii wrote:

Are you saying that url-generic-parse-url performs this encoding, and
that using a unibyte buffer causes that to fail?

No, url-generic-parse-url contains logic that allows to distinguish between the domain and the path parts of an URL. So apparently it might have to work on multibyte URLs.

That's not strictly necessary, however, given how url-encode-url uses it currently (it performs encode-coding-string and decode-coding-string on the URL string).

That approach seems flawed to me, but either way, someone will have to choose how url-encode-url should use url-generic-parse-url. If we intend to leave it as-is, then the proposed patch using set-buffer-multibyte actually works fine, even on master, with multibyte URLs.

So I think the encoding of the URL parts should be performed inside
url-http-create-request.

Fine with me, but when I suggested that, you didn't like the
suggestion.  If you changed your mind, let's do that.

See below. But yes, I'm more inclined toward this approach now, after Lar's objection, and after looking at the code in master.

On the master branch, host is passed through IDNA encoding, but
real-fname is untouched. On emacs-25, I think we should convert both
to unibyte.

Not sure I understand why there should be a difference between the two
branches.  Encoding an ASCII string doesn't do any harm.

Since it's ASCII, using utf-8 there seems misleading to me. It's a question of readability. As a bonus, using us-ascii will validate that the strings indeed do not contain any unexpected characters.

(Why doesn't (encode-coding-string "aaaa" 'ascii) work?)

It's 'us-ascii, not 'ascii.

Thanks. Attaching a patch, it seems to work well enough.

I'd like to wait for Lar's response now, but someone will have to make an executive decision. Both patches (this and the set-multibyte-buffer-p one), work in the cases I've tested.

This one seems more conservative, but it'll require a manual merge to master. The other one is very trivial, will merge automatically, but might cause problems for potential less-careful uses of url-generic-parse-url.

Attachment: url-http--encode-string.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]