|
From: | Alex Shinn |
Subject: | Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri. |
Date: | Wed, 23 Jan 2013 17:09:06 +0900 |
Yes, I ran into this when I was adding UTF-8 support to mbox... If you were to add wide char support in srfi-14, is there a way to quantify the performance penalty?
On Thu, Jan 17, 2013 at 4:51 AM, Peter Bex <address@hidden> wrote:On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote:
> This result looks broken. As I noted in my previous mail, the URI
> representation already handles non-ASCII characters and escapes on output:
>
> $ csi -R uri-common
> #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
> #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
> query=#f fragment=#f>
> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
> "삼계탕")))
> "http://127.0.0.1/82%BCB3%8483%95"
>
> Unrelated, the actual escaped output looks buggy - it looks likeOK, I took some time to investigate and I pinpointed this problem.
> some characters like the leading "%EC%" are getting dropped.
This appears to happen due to the use of core srfi-14 and srfi-13 in
uri-generic; its char-set operations simply don't deal with anything
beyond ASCII.As an aside from the uri discussion, we really need to fix srfi-14.The reference implementation is terrible. Not only does it nothandle Unicode, but it doesn't not-handle it gracefully:#;1> (char-set-contains? char-set:full #\x100)Error: (string-ref) out of range [...]At a minimum we should avoid these errors, but really weshould be using a Unicode-aware implementation - there's nobarrier to doing so like there is for Unicode strings. We couldjust move utf8-srfi-14 into the core, or I could patch up thesrfi-14 implementation to handle wide chars properly (but maybeslowly) without bringing in the iset dependency.--Alex_______________________________________________
Chicken-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-users
[Prev in Thread] | Current Thread | [Next in Thread] |