[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
URL library problem
From: |
Paul Pogonyshev |
Subject: |
URL library problem |
Date: |
Sun, 2 Oct 2005 21:48:06 +0300 |
User-agent: |
KMail/1.4.3 |
Hello,
I believe I have found a serious problem in the URL library. If you
look at the very end of function `url-http', you can see that the
result of `url-http-create-request' is sent to the connection as-is.
But encoding of the connection is binary! It means, that multibyte
strings are sent in Emacs internal coding, which nothing but Emacs
understands.
Form data sent as `multipart/form-data' is usually sent in the
encoding of the page, e.g. UTF-8. With the current state of URL, it
seems to be impossible to send non-ASCII `multipart/form-data'.
Here is a test:
(let ((url-request-method "POST")
(url-request-extra-headers '(("Content-Type" . "multipart/form-data;
boundary=---")))
(url-request-data (concat "-----\r\nContent-Disposition: form-data;
name=\"wpTextbox1\"\r\n\r\n"
"проверка\r\n"
"-------\r\n")))
(url-retrieve
"http://en.wikipedia.org/w/index.php?title=Test_page&action=submit"
(lambda () (pop-to-buffer (current-buffer)))))
Save the buffer it pops up as an HTML and open it in a browser. It
should be a Wikipedia preview page with Russian word ``проверка''
(`test'), but it isn't. Instead of UTF-8, the word got sent in Emacs
internal coding.
Note how explicit UTF-8 encoding helps nothing, because
`url-request-data' is later concatenated with some strings turning
multibyte again:
(let ((url-request-method "POST")
(url-request-extra-headers '(("Content-Type" . "multipart/form-data;
boundary=---")))
(url-request-data (encode-coding-string
(concat "-----\r\nContent-Disposition: form-data;
name=\"wpTextbox1\"\r\n\r\n"
"проверка\r\n"
"-------\r\n")
'utf-8)))
(url-retrieve
"http://en.wikipedia.org/w/index.php?title=Test_page&action=submit"
(lambda () (pop-to-buffer (current-buffer)))))
However, this trivial (and not-for-production) patch makes the first
test work, because it encode the complete request, which is then sent
to Wikipedia server unmodified:
--- /home/paul/emacs/lisp/url/url-http.el 2005-09-28 16:56:02.000000000
+0300
+++ /tmp/buffer-content-2240ocC 2005-10-02 21:30:00.000000000 +0300
@@ -268,7 +268,7 @@ request.
;; Any data
url-request-data))
(url-http-debug "Request is: \n%s" request)
- request))
+ (encode-coding-string request 'utf-8))
;; Parsing routines
(defun url-http-clean-headers ()
Of course, uncoditional encoding in UTF-8 is not a right thing to do.
Actually, encoding of the complete request is not right. A proper
patch would simply avoid concatenating `url-request-data' with
anything and send it to the connection verbatim, assuming that the
user of the library has already properly encoded it. The reason for
this is that `multipart/form-data' can have different parts in
different encoding (even if it is hardly ever used.)
Are you interested in a patch?
Paul
- URL library problem,
Paul Pogonyshev <=