help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: treatment of U+002E that is produced by NFKC


From: Simon Josefsson
Subject: Re: treatment of U+002E that is produced by NFKC
Date: Sun, 13 Jan 2008 10:23:01 +0100
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.50 (gnu/linux)

"Erik van der Poel" <address@hidden> writes:

> GNU libidn handles the case below in the same way as Opera 9 and ICU,
> but MSIE 7 and Firefox 2 handle it differently.
>
> I tried the demo page at http://josefsson.org/idn.php/
...
>
> Speaking of U+2024 and where in the protocol stack to handle things, I
> just discovered that MSIE 7 and Firefox 2 both perform NFKC on this
> character, to yield U+002E (.). After that, they divide the host name
> into labels *again*, so the new U+002E becomes a new label separator.

I don't understand what the problem is.  I'm not even sure you are
claiming there is a problem in libidn?

If I invoke:

address@hidden:~$ idn --debug --quiet foo․bar
Charset `UTF-8'.
input[0] = U+0066
input[1] = U+006f
input[2] = U+006f
input[3] = U+2024
input[4] = U+0062
input[5] = U+0061
input[6] = U+0072
tld[0] = U+0066
tld[1] = U+006f
tld[2] = U+006f
tld[3] = U+002e
tld[4] = U+0062
tld[5] = U+0061
tld[6] = U+0072
output[0] = U+0066
output[1] = U+006f
output[2] = U+006f
output[3] = U+002e
output[4] = U+0062
output[5] = U+0061
output[6] = U+0072
foo.bar
address@hidden:~$ 

The web page for the same input is:

http://josefsson.org/idn.php/?data=foo%E2%80%A4bar&profile=Nameprep&mode=toascii&debug=on&charset=UTF-8&lastcharset=UTF-8

This looks correct to me.  What is wrong?

> If we ever get around to writing a document about IDNA in HTML, we may
> want to make a note of this. I.e. the steps are:
>
> (1) Divide the domain name into labels by looking for IDNA2003 dots.
> (2) Perform Nameprep2003 on each non-ASCII label.
> (3) Divide each label into multiple labels, by looking for regular dots.
> (4) Perform Punycode2003 on each non-ASCII label.

Why not add U+2024 to the list of dot-like code points in RFC 3490
section 3.1 instead?

/Simon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]