help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: treatment of U+002E that is produced by NFKC


From: Simon Josefsson
Subject: Re: AW: treatment of U+002E that is produced by NFKC
Date: Tue, 15 Jan 2008 15:09:15 +0100
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/22.1 (gnu/linux)

"Erik van der Poel" <address@hidden> writes:

>> I'm not yet sure whether actually providing a mechanism (like the
>> one I proposed in the patch) to work around the problem is a good thing.
>> The mechanism could just as well cause other problems.
>
> Yes, it is possible that that approach would cause other
> incompatibility problems that I cannot think of at the moment, since
> it is different from MSIE 7 and Firefox 2.

Indeed.  I've thought a bit about this, and there are some problems with
my patch:

1) It only treats U+2024 as a dot.  There are other code points as well,
but none are as simple as U+2024.  The others include:

2024;ONE DOT LEADER;Po;0;ON;<compat> 002E;;;;N;;;;;
2025;TWO DOT LEADER;Po;0;ON;<compat> 002E 002E;;;;N;;;;;
2026;HORIZONTAL ELLIPSIS;Po;0;ON;<compat> 002E 002E 002E;;;;N;;;;;
2488;DIGIT ONE FULL STOP;No;0;EN;<compat> 0031 002E;;1;1;N;DIGIT ONE PERIOD;;;;
2489;DIGIT TWO FULL STOP;No;0;EN;<compat> 0032 002E;;2;2;N;DIGIT TWO PERIOD;;;;
...
2498;NUMBER SEVENTEEN FULL STOP;No;0;EN;<compat> 0031 0037 002E;;;17;N;NUMBER 
SEVENTEEN PERIOD;;;;
...
249B;NUMBER TWENTY FULL STOP;No;0;EN;<compat> 0032 0030 002E;;;20;N;NUMBER 
TWENTY PERIOD;;;;
33C2;SQUARE AM;So;0;L;<square> 0061 002E 006D 002E;;;;N;SQUARED AM;;;;
33C7;SQUARE CO;So;0;L;<square> 0043 006F 002E;;;;N;SQUARED CO;;;;
33D8;SQUARE PM;So;0;L;<square> 0070 002E 006D 002E;;;;N;SQUARED PM;;;;
FE52;SMALL FULL STOP;Po;0;CS;<small> 002E;;;;N;SMALL PERIOD;;;;

It would be incorrect to treat all of these as dots as well.  For
example:

ToASCII(hi U+248C com) = hi5.com

If we extend my patch for U+248C one, libidn would generate 'hi.com'
instead of 'hi5.com'.

Right now, both Firefox and libidn translates the input into the ASCII
string hi5.com.  Arguable Firefox is incorrect (wrt the RFC) in that it
treat the string as two labels rather than one.

2) As you say, the patch is different from what MSIE/Firefox really
implements.  The only advantage with a new flag in libidn (that I see)
would be if it does exactly the same as MSIE/Firefox.  But it doesn't.

Thus, my patch seems to be the wrong thing, and I'm not going to install
it now.

If someone wants to work on a patch against libidn that makes it
implement the MSIE/Firefox algorithm, when a new IDNA flag is given,
that would be something we could seriously consider applying.  I'm
currently too busy to do this on a pro-bono basis though.

Thanks,
/Simon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]