help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Disagreement between libidn2 and Python idna


From: Tim Rühsen
Subject: Re: Fwd: Disagreement between libidn2 and Python idna
Date: Sun, 8 Nov 2020 19:03:27 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

Hi Ian,

thanks for reaching out and reporting those issues.

A differential fuzzer is a nice thing to have - I agree that different implementation should lead to the same (correct) results.

Issue #3 indeed seems to be a matter of upgrading to Unicode 12 as we currently use tables from Unicode 11.0.

I'll look into this and the other issues likely during the next 5-7 days.

Cheers, Tim

On 07.11.20 00:22, Ian Eldred Pudney wrote:
Hello,

I'm from security at Google. I'm working on a differential fuzzer between libidn2 and the Python idna package. (Essentially, I've written a program that rapidly tries inputs for libidn2 and Python idna, and makes sure that the same input produces the same result). I was writing this to find bugs in the Python idna package, but I think I've found 3 bugs in libidn2 instead. I'm reaching out to report these 3 bugs.

In all of these cases, libidn2 rejects encoding the specified domain name with an error, but Python idna encodes it fine. Also, in all of these cases, libidn2 will happily /decode/ the punycode generated by Python idna, into the same input that it refuses to encode.

This input causes libidn2 to report an error of "domain name longer than 255 characters." However, the punycode domain name is only 146 characters.

  * Domain name:

    髦暩晦晦晦獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳筳獳
    싂.퐀쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓼쓄쓄쓄쓄쓄쓄쓄쓄쓄㻄쓄쓄럄䄀싂.뼀猀獳獳
    獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳ⱁ㩁

  * Domain name hex codepoints:

    ['9ae6', '66a9', '6666', '6666', '6666', '7373', '7373', '7373',
    '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
    '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
    '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
    '7b73', '7373', 'c2c2', '2e', 'd400', 'c4c4', 'c4c4', 'c4c4',
    'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
    'c4fc', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
    'c4c4', 'c4c4', '3ec4', 'c4c4', 'c4c4', 'b7c4', '4100', 'c2c2',
    '2e', 'bf00', '7300', '7373', '7373', '7373', '7373', '7373',
    '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
    '7373', '7373', '7373', '7373', '7373', '2c41', '3a41']

  * Punycode:

    
xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b


This input causes libidn2 encoding to report an error of "string has forbidden bi-directional properties". To determine which library was wrong, I implemented the bidi rule myself, and I believe this should be valid.

  * Domain name:

    ਗ਼.ÿ߽̃̃̃

  * Domain name hex codepoints:

    ['a17', 'a3c', '2e', 'ff', '7fd', '303', '303', '303']

  * Punycode:

    
xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b


This input causes libidn2 to report a disallowed character. This appears to not be a "bug", but rather out-of-date tables in libidn2. The offending character <https://www.fileformat.info/info/unicode/char/0e90/index.htm> was only added to Unicode in 2019.

  * Domain name:

    ຐ.xyz <http://xn--46c.xyz>

  * Domain name hex codepoints:

    ['e90', '2e', '78', '79', '7a']

  * Punycode:

    xn--46c.xyz <http://xn--46c.xyz>


Attachment: OpenPGP_0x08302DB6A2670428.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]