help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode 15 support - using UTC instead of IANA as table source? On U+19


From: Simon Josefsson
Subject: Unicode 15 support - using UTC instead of IANA as table source? On U+19DA
Date: Tue, 18 Oct 2022 21:13:27 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Hi

I am considering switching to UTC as the source of our derived IDNA2008
tables, for simple support of Unicode > 12.  For Unicode <= 12 this has
no difference except for U+19DA which UTC has as PVALID and IANA as
DISALLOWED.  This means idn2 behaviour changes from:

jas@latte:~$ echo ᧚|idn2
idn2: toAscii: string contains a disallowed character

into

jas@latte:~/src/libidn2/src$ echo ᧚|./idn2
xn--pkf

This actually goes back to libidn2 0.11 behaviour, which also resulted
in xn--pkf since it used Unicode < 6.0.0:

jas@latte:~/src/libidn2-0.11/src$ ./idn2 --version|head -1
idn2 (idn2) 0.11
jas@latte:~/src/libidn2-0.11/src$ echo ᧚|./idn2
xn--pkf
jas@latte:~/src/libidn2-0.11/src$

The xn--pkf output is consistent with some other IDNA2008
implementations:

https://icu4c-demos.unicode.org/icu-bin/idnbrowser?t=xn--th5h
https://idnaconv.net/try-it.html?encoded=xn--th5h&decode=%3C%3C+Decode

There may be other differences between UTC derived values and IANA
derived values for Unicode > 12 and <= 15 once IANA gets around to
publishing tables, but we can't tell until that happens and I'm not
holding my horses since they haven't published anything for 12.1.0
(2019-03), 13.0.0 (2019-11), 14.0.0 (2021) nor 15.0.0 (2022-05).

I don't have a strong opinion on this, but some of the factors involved
are:

1) consistency with other implementations

2) importance of U+19DA (which is rare) and practical problems resulting
from this change (apparently little)

3) support Unicode > 12 now (most important of these factors IMO)

4) domain name stability: once derived for a code point, the property
shouldn't change in the future.  thus, the change in 0.12 could be
considered the bug here.  I believe I agreed with the approach used by
RFC 6452 at the time it was published, but revisiting this issue today I
find myself in the opposite camp.  It is a subjective judgement call,
and there are good arguments for both sides.

If you want to provide feedback on this, please respond here or to this
issue:

https://gitlab.com/libidn/libidn2/-/issues/112

/Simon

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]