chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.


From: Peter Bex
Subject: Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
Date: Wed, 16 Jan 2013 20:51:48 +0100
User-agent: Mutt/1.4.2.3i

On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote:
> This result looks broken.  As I noted in my previous mail, the URI
> representation already handles non-ASCII characters and escapes on output:
> 
> $ csi -R uri-common
> #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
> #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
> query=#f fragment=#f>
> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
> "삼계탕")))
> "http://127.0.0.1/82%BCB3%8483%95";
> 
> Unrelated, the actual escaped output looks buggy - it looks like
> some characters like the leading "%EC%" are getting dropped.

OK, I took some time to investigate and I pinpointed this problem.
This appears to happen due to the use of core srfi-14 and srfi-13 in
uri-generic; its char-set operations simply don't deal with anything
beyond ASCII.  Only by switching to the UTF versions utf8-srfi-14,
utf8-srfi-13 and unicode-char-sets this works:

Without patch:
$ csi -R uri-generic -P '(uri-encode-string "삼계탕")'
"�%82%BC�%B3%84�%83%95"

With patch:
$ csi -R uri-generic -P '(uri-encode-string "삼계탕")'
"%EC%82%BC%EA%B3%84%ED%83%95"

Ivan, what do you think about adding the UTF8 dependency, as per the
attached patch (against trunk)?

Cheers,
Peter
-- 
http://sjamaan.ath.cx

Attachment: uri-generic-utf8.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]