help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: treatment of U+002E that is produced by NFKC


From: Erik van der Poel
Subject: Re: AW: treatment of U+002E that is produced by NFKC
Date: Tue, 15 Jan 2008 06:30:07 -0800

Yes, that's right.

By the way, there may be a different way to address this issue. If
libidn has a separate API for NFKC or Nameprep, the caller could pass
the entire domain name (including all of the dots and dot-like
characters) through NFKC (or Nameprep) first, and then call the normal
IDNA routine. This is quite likely to behave the same way as MSIE 7
and Firefox 2. If you chose this approach, you could simply document
this somewhere, and callers could then decide whether or not to go
this way.

Erik

> >> I'm not yet sure whether actually providing a mechanism (like the
> >> one I proposed in the patch) to work around the problem is a good thing.
> >> The mechanism could just as well cause other problems.
> >
> > Yes, it is possible that that approach would cause other
> > incompatibility problems that I cannot think of at the moment, since
> > it is different from MSIE 7 and Firefox 2.
>
> Indeed.  I've thought a bit about this, and there are some problems with
> my patch:
>
> 1) It only treats U+2024 as a dot.  There are other code points as well,
> but none are as simple as U+2024.  The others include:
>
> 2024;ONE DOT LEADER;Po;0;ON;<compat> 002E;;;;N;;;;;
> 2025;TWO DOT LEADER;Po;0;ON;<compat> 002E 002E;;;;N;;;;;
> 2026;HORIZONTAL ELLIPSIS;Po;0;ON;<compat> 002E 002E 002E;;;;N;;;;;
> 2488;DIGIT ONE FULL STOP;No;0;EN;<compat> 0031 002E;;1;1;N;DIGIT ONE 
> PERIOD;;;;
> 2489;DIGIT TWO FULL STOP;No;0;EN;<compat> 0032 002E;;2;2;N;DIGIT TWO 
> PERIOD;;;;
> ...
> 2498;NUMBER SEVENTEEN FULL STOP;No;0;EN;<compat> 0031 0037 002E;;;17;N;NUMBER 
> SEVENTEEN PERIOD;;;;
> ...
> 249B;NUMBER TWENTY FULL STOP;No;0;EN;<compat> 0032 0030 002E;;;20;N;NUMBER 
> TWENTY PERIOD;;;;
> 33C2;SQUARE AM;So;0;L;<square> 0061 002E 006D 002E;;;;N;SQUARED AM;;;;
> 33C7;SQUARE CO;So;0;L;<square> 0043 006F 002E;;;;N;SQUARED CO;;;;
> 33D8;SQUARE PM;So;0;L;<square> 0070 002E 006D 002E;;;;N;SQUARED PM;;;;
> FE52;SMALL FULL STOP;Po;0;CS;<small> 002E;;;;N;SMALL PERIOD;;;;
>
> It would be incorrect to treat all of these as dots as well.  For
> example:
>
> ToASCII(hi U+248C com) = hi5.com
>
> If we extend my patch for U+248C one, libidn would generate 'hi.com'
> instead of 'hi5.com'.
>
> Right now, both Firefox and libidn translates the input into the ASCII
> string hi5.com.  Arguable Firefox is incorrect (wrt the RFC) in that it
> treat the string as two labels rather than one.
>
> 2) As you say, the patch is different from what MSIE/Firefox really
> implements.  The only advantage with a new flag in libidn (that I see)
> would be if it does exactly the same as MSIE/Firefox.  But it doesn't.
>
> Thus, my patch seems to be the wrong thing, and I'm not going to install
> it now.
>
> If someone wants to work on a patch against libidn that makes it
> implement the MSIE/Firefox algorithm, when a new IDNA flag is given,
> that would be something we could seriously consider applying.  I'm
> currently too busy to do this on a pro-bono basis though.
>
> Thanks,
> /Simon
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]