emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

URL library problem


From: Paul Pogonyshev
Subject: URL library problem
Date: Sun, 2 Oct 2005 21:48:06 +0300
User-agent: KMail/1.4.3

Hello,

I believe I have found a serious problem in the URL library.  If you
look at the very end of function `url-http', you can see that the
result of `url-http-create-request' is sent to the connection as-is.
But encoding of the connection is binary!  It means, that multibyte
strings are sent in Emacs internal coding, which nothing but Emacs
understands.

Form data sent as `multipart/form-data' is usually sent in the
encoding of the page, e.g. UTF-8.  With the current state of URL, it
seems to be impossible to send non-ASCII `multipart/form-data'.

Here is a test:


(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; 
boundary=---")))
      (url-request-data (concat "-----\r\nContent-Disposition: form-data; 
name=\"wpTextbox1\"\r\n\r\n"
                                "проверка\r\n"
                                "-------\r\n")))
  (url-retrieve 
"http://en.wikipedia.org/w/index.php?title=Test_page&action=submit";
                (lambda () (pop-to-buffer (current-buffer)))))


Save the buffer it pops up as an HTML and open it in a browser.  It
should be a Wikipedia preview page with Russian word ``проверка''
(`test'), but it isn't.  Instead of UTF-8, the word got sent in Emacs
internal coding.

Note how explicit UTF-8 encoding helps nothing, because
`url-request-data' is later concatenated with some strings turning
multibyte again:


(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; 
boundary=---")))
      (url-request-data (encode-coding-string
                         (concat "-----\r\nContent-Disposition: form-data; 
name=\"wpTextbox1\"\r\n\r\n"
                                 "проверка\r\n"
                                 "-------\r\n")
                         'utf-8)))
  (url-retrieve 
"http://en.wikipedia.org/w/index.php?title=Test_page&action=submit";
                (lambda () (pop-to-buffer (current-buffer)))))


However, this trivial (and not-for-production) patch makes the first
test work, because it encode the complete request, which is then sent
to Wikipedia server unmodified:


--- /home/paul/emacs/lisp/url/url-http.el       2005-09-28 16:56:02.000000000 
+0300
+++ /tmp/buffer-content-2240ocC 2005-10-02 21:30:00.000000000 +0300
@@ -268,7 +268,7 @@ request.
           ;; Any data
           url-request-data))
     (url-http-debug "Request is: \n%s" request)
-    request))
+    (encode-coding-string request 'utf-8))
 
 ;; Parsing routines
 (defun url-http-clean-headers ()


Of course, uncoditional encoding in UTF-8 is not a right thing to do.
Actually, encoding of the complete request is not right.  A proper
patch would simply avoid concatenating `url-request-data' with
anything and send it to the connection verbatim, assuming that the
user of the library has already properly encoded it.  The reason for
this is that `multipart/form-data' can have different parts in
different encoding (even if it is hardly ever used.)

Are you interested in a patch?

Paul





reply via email to

[Prev in Thread] Current Thread [Next in Thread]