[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings |
Date: |
Fri, 22 Mar 2013 15:37:04 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux) |
Laszlo Ersek <address@hidden> writes:
> On 03/14/13 18:49, Markus Armbruster wrote:
>> These are all broken, too.
>
> What are "these"? And how are they broken? And how does the patch fix them?
"These" refers to the subject: noncharacters other than U+FFFE, U+FFFF.
I agree that I should better explain how they're broken, and what the
patch does to fix them. Will fix on respin.
>>
>> A few test cases use noncharacters U+FFFF and U+10FFFF. Risks testing
>> noncharacters some more instead of what they're supposed to test. Use
>> U+FFFD and U+10FFFD instead.
>>
>> Signed-off-by: Markus Armbruster <address@hidden>
>> ---
>> tests/check-qjson.c | 85
>> +++++++++++++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 72 insertions(+), 13 deletions(-)
>
> I'm confused about the commit message. There are three paragraphs in it
> (the title, the first paragraph, and the 2nd paragraph). This patch
> modifies different tests:
>
>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>> index 852124a..efec1b2 100644
>> --- a/tests/check-qjson.c
>> +++ b/tests/check-qjson.c
>> @@ -158,7 +158,7 @@ static void utf8_string(void)
>> * consider using overlong encoding \xC0\x80 for U+0000 ("modified
>> * UTF-8").
>> *
>> - * Test cases are scraped from Markus Kuhn's UTF-8 decoder
>> + * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>> * capability and stress test at
>> * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>> */
>> @@ -256,11 +256,11 @@ static void utf8_string(void)
>> "\xDF\xBF",
>> "\"\\u07FF\"",
>> },
>> - /* 2.2.3 3 bytes U+FFFF */
>> + /* 2.2.3 3 bytes U+FFFD */
>> {
>> - "\"\xEF\xBF\xBF\"",
>> - "\xEF\xBF\xBF",
>> - "\"\\uFFFF\"",
>> + "\"\xEF\xBF\xBD\"",
>> + "\xEF\xBF\xBD",
>> + "\"\\uFFFD\"",
>> },
>
> This is under "2.2 Last possible sequence of a certain length". I guess
Which is in turn under "2 Boundary condition test cases".
> this is where you say "last possible sequence of a certain length,
> encoding a character (= non-noncharacter)". OK, p#2.
Yes.
The test's purpose is testing the upper bound of 3-byte sequences is
decoded correctly.
The upper bound is U+FFFF. Since that's a noncharacter, the parser
should reject it (or maybe replace), the formatter should replace it.
Trouble is it could be misdecoded and then rejected / replaced.
Besides, U+FFFF already gets tested along with the other noncharacters
under "5.3 Other illegal code positions".
Next in line is U+FFFE, also a noncharacter, also under 5.3.
Next in line is U+FFFD, which I picked.
But that gets tested under "2.3 Other boundary conditions"! I guess I
either drop it there, or make this one U+FFFC.
I think testing U+FFFC here makes sense, because U+FFFD could be
misdecoded, then replaced by U+FFFD.
What do you think?
>> /* 2.2.4 4 bytes U+1FFFFF */
>> {
>> @@ -303,10 +303,10 @@ static void utf8_string(void)
>> "\"\\uFFFD\"",
>> },
>> {
>> - /* U+10FFFF */
>> - "\"\xF4\x8F\xBF\xBF\"",
>> - "\xF4\x8F\xBF\xBF",
>> - "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */
>> + /* U+10FFFD */
>> + "\"\xF4\x8F\xBF\xBD\"",
>> + "\xF4\x8F\xBF\xBD",
>> + "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFD\"" */
>> },
>> {
>> /* U+110000 */
>
> Under "2.3 Other boundary conditions". Not a non-character any longer,
> but also not a boundary condition. At least not the original one. Still
> covered by the ...FFFD part of the commit message, p#2.
The test's purpose is testing the upper bound of the Unicode range gets
decoded correctly.
The upper bound is U+10FFFF. Since that's a noncharacter... same
argument as above.
Next in line is U+10FFFE, also a noncharacter.
Next in line is U+10FFFD, which I picked.
>> @@ -584,9 +584,9 @@ static void utf8_string(void)
>> "\"\\u07FF\"",
>> },
>> {
>> - /* \U+FFFF */
>> - "\"\xF0\x8F\xBF\xBF\"",
>> - "\xF0\x8F\xBF\xBF", /* bug: not corrected */
>> + /* \U+FFFD */
>> + "\"\xF0\x8F\xBF\xBD\"",
>> + "\xF0\x8F\xBF\xBD", /* bug: not corrected */
>> "\"\\u03FF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>> },
>> {
>
> Under "4.2 Maximum overlong sequences". What does that even mean? "In
> some sense maximum codepoints, all represented as overlong sequences"? P#2.
The headings are all stolen from the same source as the test cases.
Perhaps I should steal more of the explanatory text there as well.
The one for 4.2 is:
Below you see the highest Unicode value that is still resulting in
an overlong sequence if represented with the given number of bytes.
This is a boundary test for safe UTF-8 decoders. All five
characters should be rejected like malformed UTF-8 sequences.
>> @@ -731,6 +731,7 @@ static void utf8_string(void)
>> "\"\\uDBFF\\uDFFF\"", /* bug: want "\"\\uFFFF\\uFFFF\"" */
>> },
>> /* 5.3 Other illegal code positions */
>> + /* BMP noncharacters */
>> {
>> /* \U+FFFE */
>> "\"\xEF\xBF\xBE\"",
>> @@ -741,7 +742,65 @@ static void utf8_string(void)
>> /* \U+FFFF */
>> "\"\xEF\xBF\xBF\"",
>> "\xEF\xBF\xBF", /* bug: not corrected */
>> - "\"\\uFFFF\"", /* bug: not corrected */
>> + "\"\\uFFFF\"",
>> + },
>> + {
>> + /* U+FDD0 */
>> + "\"\xEF\xB7\x90\"",
>> + "\xEF\xB7\x90", /* bug: not corrected */
>> + "\"\\uFDD0\"", /* bug: not corrected */
>> + },
>> + {
>> + /* U+FDEF */
>> + "\"\xEF\xB7\xAF\"",
>> + "\xEF\xB7\xAF", /* bug: not corrected */
>> + "\"\\uFDEF\"", /* bug: not corrected */
>> + },
>> + /* Plane 1 .. 16 noncharacters */
>> + {
>> + /* U+1FFFE U+1FFFF U+2FFFE U+2FFFF ... U+10FFFE U+10FFFF */
>> + "\"\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
>> + "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
>> + "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
>> + "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
>> + "\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
>> + "\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
>> + "\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
>> + "\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
>> + "\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
>> + "\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
>> + "\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
>> + "\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
>> + "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
>> + "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
>> + "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
>> + "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF\"",
>> + /* bug: not corrected */
>> + "\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
>> + "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
>> + "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
>> + "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
>> + "\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
>> + "\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
>> + "\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
>> + "\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
>> + "\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
>> + "\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
>> + "\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
>> + "\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
>> + "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
>> + "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
>> + "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
>> + "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
>> + /* bug: not corrected */
>> + "\"\\u07FF\\uFFFF\\u07FF\\uFFFF\\u0BFF\\uFFFF\\u0BFF\\uFFFF"
>> + "\\u0FFF\\uFFFF\\u0FFF\\uFFFF\\u13FF\\uFFFF\\u13FF\\uFFFF"
>> + "\\u17FF\\uFFFF\\u17FF\\uFFFF\\u1BFF\\uFFFF\\u1BFF\\uFFFF"
>> + "\\u1FFF\\uFFFF\\u1FFF\\uFFFF\\u23FF\\uFFFF\\u23FF\\uFFFF"
>> + "\\u27FF\\uFFFF\\u27FF\\uFFFF\\u2BFF\\uFFFF\\u2BFF\\uFFFF"
>> + "\\u2FFF\\uFFFF\\u2FFF\\uFFFF\\u33FF\\uFFFF\\u33FF\\uFFFF"
>> + "\\u37FF\\uFFFF\\u37FF\\uFFFF\\u3BFF\\uFFFF\\u3BFF\\uFFFF"
>> + "\\u3FFF\\uFFFF\\u3FFF\\uFFFF\\u43FF\\uFFFF\\u43FF\\uFFFF\"",
>> },
>> {}
>> };
>>
>
> This is probably p#0 (the title).
>
> Ah. Have you removed the noncharacters from the other tests, but made up
> for them at the end with new noncharacter tests?
I added tests to cover all 66 noncharacters. Then I noticed some
duplicate test cases elsewhere, and realized that these don't really
fully test what I want to test there.
> Reviewed-by: Laszlo Ersek <address@hidden>
Thanks!
[Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite, Markus Armbruster, 2013/03/14
Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/17
Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/23