qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
Date: Fri, 10 Aug 2018 16:18:59 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Eric Blake <address@hidden> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> utf8_string() tests only double quoted strings.  Cover single quoted
>> strings, too: store the strings to test without quotes, then wrap them
>> in either kind of quote.
>>
>> Signed-off-by: Markus Armbruster <address@hidden>
>> ---
>>   tests/check-qjson.c | 427 ++++++++++++++++++++++----------------------
>>   1 file changed, 214 insertions(+), 213 deletions(-)
>>
>
> Pre-existing, but:
>
>>           /* 2.2.4  4 bytes U+1FFFFF */
>
> Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that
> is not valid Unicode, even if it IS a valid interpretation of UTF-8
> encoding.

Correct.  Testing how we handle such sequences makes sense all the same.

>>           {
>> -            "\"\xF7\xBF\xBF\xBF\"",
>> +            "\xF7\xBF\xBF\xBF",
>>               NULL,               /* bug: rejected */
>> -            "\"\\uFFFD\"",
>> +            "\\uFFFD",
>>               "\xF7\xBF\xBF\xBF",
>>           },
>>           /* 2.2.5  5 bytes U+3FFFFFF */
>
> Which makes this one also questionable,
>
>>           {
>> -            "\"\xFB\xBF\xBF\xBF\xBF\"",
>> +            "\xFB\xBF\xBF\xBF\xBF",
>>               NULL,               /* bug: rejected */
>> -            "\"\\uFFFD\"",
>> +            "\\uFFFD",
>>               "\xFB\xBF\xBF\xBF\xBF",
>>           },
>>           /* 2.2.6  6 bytes U+7FFFFFFF */
>
> and this one.
>
>>           {
>>               /* last one in last plane: U+10FFFD */
>> -            "\"\xF4\x8F\xBF\xBD\"",
>>               "\xF4\x8F\xBF\xBD",
>> -            "\"\\uDBFF\\uDFFD\""
>> +            "\xF4\x8F\xBF\xBD",
>> +            "\\uDBFF\\uDFFD"
>>           },
>>           {
>>               /* first one beyond Unicode range: U+110000 */
>
> while these are reasonable.
>
> The conversion of the initializer looks sane (well, mechanical).  Ergo:
>
> Reviewed-by: Eric Blake <address@hidden>

Thanks!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]