bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’


From: Timothy Sample
Subject: bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
Date: Sun, 02 Jun 2019 20:39:16 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

Hi,

Ludovic Courtès <address@hidden> writes:

> Hi Timothy,
>
> Timothy Sample <address@hidden> skribis:
>
>> A quick reading of RFC 3986 suggests that the host part of a URI can be
>> an IP address (version 4 or 6) or a registered name.  It gives the
>> following rules for registered names:
>>
>> reg-name      = *( unreserved / pct-encoded / sub-delims )
>> unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
>> pct-encoded   = "%" HEXDIG HEXDIG
>> sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
>>               / "*" / "+" / "," / ";" / "="
>>
>> Here, “ALPHA”, “DIGIT”, and “HEXDIG” are specified in RFC 2234, and are
>> just the ASCII ranges you might expect (except for that “HEXDIG” only
>> allows uppercase letters).
>
> Do you think you could turn that into a patch for Guile?  I’d happily
> apply it.  :-)
>
> It looks like both [[:alnum:]] & co. and ranges would be
> locale-dependent, so my understanding is that we’ll have to list all the
> characters explicitly, right?

Here’s a patch for Guile that uses explicit lists of characters in the
‘(web uri)’ module instead of character ranges.  It includes two tests
that are pretty verbose, but seem to do the trick.

I have a bit more background on the problem, mostly coming from a Glibc
bug report: <https://sourceware.org/bugzilla/show_bug.cgi?id=23393>.

It turns out that it is well-known upstream, and avoiding character
ranges is the recommended approach for know.  Some other GNU tools have
adopted what is being called the “Rational Range Interpretation”
<https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html>.
AIUI, this means they use the underlying encoding numbers for ranges (I
checked the source, but I’m only mostly sure I read it right).  It looks
like the Glibc folks are unsure how to proceed on this (but are maybe
slightly leaning towards the “rational” approach).

It’s all a pretty big mess, really.  I was hoping there would be some
obvious thing that would fix the problem more generally.  Short of
pulling in the Gnulib regex code or writing something in Scheme, it
looks like Guile is stuck where it is now.

I’m unsure if the changes are considered “trivial” from a copyright
perspective.  It’s pretty close, but I think programmers tend to
underestimate here.  I’ve started the FSF copyright assignment process
either way, since is likely not my last Guile patch.  :)


-- Tim

Attachment: 0001-Make-URI-handling-locale-independent.patch
Description: patch


reply via email to

[Prev in Thread] Current Thread [Next in Thread]