help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: treatment of U+002E that is produced by NFKC


From: Erik van der Poel
Subject: Re: AW: treatment of U+002E that is produced by NFKC
Date: Tue, 15 Jan 2008 07:42:27 -0800

Looks good to me.

Other than your interpretation of RFC 3490 leading to the insertion of
0x2E into a DNS label, but I guess you and I will simply have to agree
that we disagree on this point. RFC 3490 should have been clearer. By
the way, I did a Web search for "2024 nfkc" and found that this issue
was raised, but I guess it was not resolved adequately:

http://www.ops.ietf.org/lists/idn/idn.2001/msg02450.html

Erik

On Jan 15, 2008 7:15 AM, Simon Josefsson <address@hidden> wrote:
> "Erik van der Poel" <address@hidden> writes:
>
> > Yes, that's right.
> >
> > By the way, there may be a different way to address this issue. If
> > libidn has a separate API for NFKC or Nameprep, the caller could pass
> > the entire domain name (including all of the dots and dot-like
> > characters) through NFKC (or Nameprep) first, and then call the normal
> > IDNA routine. This is quite likely to behave the same way as MSIE 7
> > and Firefox 2. If you chose this approach, you could simply document
> > this somewhere, and callers could then decide whether or not to go
> > this way.
>
> Libidn has a simple NFKC interface, and I'm documenting that approach
> now.  Below is the current text in the manual.  I'll forward this to the
> Firefox IDN guys to see if they are interested in documenting their
> practice further, possibly in an I-D.  If ToASCII(NFKC(i)) turns out to
> actually work and behave better than RFC 3490, documenting that now
> seems useful.
>
> Thanks,
> /Simon
>
> Appendix B On Label Separators
> ******************************
>
> Some strings contains characters whose NFKC normalized form contain the
> ASCII dot (0x2E, ".").  Examples of these characters are U+2024 (ONE
> DOT LEADER) and U+248C (DIGIT FIVE FULL STOP).  The strings have the
> interesting property that their IDNA ToASCII output will contain
> embedded dots.  For example:
>
>      ToASCII (hi U+248C com) = hi5.com
>      ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
>
>    This demonstrate the two general cases: The first where the ASCII dot
> is part of an output that do not begin with the IDN prefix "xn-".  The
> second example illustrate when the dot is part of IDN prefixed with
> "xn-".
>
>    The input strings are, from the DNS point of view, a single label.
> The IDNA algorithm translate one label at a time.  Thus, the output is
> expected to be only one label.  What is important here is to make sure
> the DNS resolver receives the correct query.  The DNS protocol does not
> use the dot to delimit labels on the wire, rather it uses length-value
> pairs.  Thus the correct query would be for `{7}hi5.com' and
> `{22}xn--rksmrgs.com-l8as9u' respectively.
>
>    Some implementations (1) have decided that these inputs strings are
> potentially confusing for the user.  The string "hi U+248C com" looks
> like "hi5.com" on systems that support Unicode properly.  These
> implementations do not follow RFC 3490.  They yield:
>
>      ToASCII (hi U+248C com) = hi5.com
>      ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
>
>    The DNS query they perform are `{3}hi5{3}com' and
> `{18}xn--rksmrgs-5wao1o{3}com' respectively.  Arguably, this leads to a
> better user experience, and suggests that the IDNA specification is
> sub-optimal in this area.
>
> B.1 Recommended Workaround
> ==========================
>
> It has been suggested to normalize the entire input string using NFKC
> before passing it to IDNA ToASCII.  You may use
> `stringprep_utf8_nfkc_normalize' or `stringprep_ucs4_nfkc_normalize'.
> This will avoid the problem, and appears to lead to similar behaviour
> as IE/Firefox.
>
>    Alternative workarounds are being considered.  Eventually Libidn may
> implement a new flag to the `idna_*' functions that implements a
> recommended way to work around this problem.
>
>    ---------- Footnotes ----------
>
>    (1) Notably Microsoft's Internet Explorer and Mozilla's Firefox, but
> not Apple's Safari.
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]