qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs
Date: Mon, 20 Aug 2018 10:40:04 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Eric Blake <address@hidden> writes:

> On 08/17/2018 10:05 AM, Markus Armbruster wrote:
>> The JSON parser treats each half of a surrogate pair as unpaired
>> surrogate.  Fix it to recognize surrogate pairs.
>>
>> Signed-off-by: Markus Armbruster <address@hidden>
>> Reviewed-by: Eric Blake <address@hidden>
>
> I might have dropped the R-b, to ensure the changes since v1 get
> re-reviewed.

I intended to, but screwed up.  My apologies.

>> ---
>>   qobject/json-parser.c | 60 ++++++++++++++++++++++++++++---------------
>>   tests/check-qjson.c   |  3 +--
>>   2 files changed, 40 insertions(+), 23 deletions(-)
>>
>
>> @@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt, 
>> JSONToken *token)
>>                   qstring_append_chr(str, '\t');
>>                   break;
>>               case 'u':
>> -                cp = 0;
>> -                for (i = 0; i < 4; i++) {
>> -                    if (!qemu_isxdigit(*ptr)) {
>> -                        parse_error(ctxt, token,
>> -                                    "invalid hex escape sequence in 
>> string");
>> -                        goto out;
>> +                cp = cvt4hex(ptr);
>> +                ptr += 4;
>> +
>> +                /* handle surrogate pairs */
>> +                if (cp >= 0xD800 && cp <= 0xDBFF
>> +                    && ptr[0] == '\\' && ptr[1] == 'u') {
>> +                    /* leading surrogate followed by \u */
>> +                    cp = 0x10000 + ((cp & 0x3FF) << 10);
>> +                    trailing = cvt4hex(ptr + 2);
>> +                    if (trailing >= 0xDC00 && trailing <= 0xDFFF) {
>> +                        /* followed by trailing surrogate */
>> +                        cp |= trailing & 0x3FF;
>> +                        ptr += 6;
>> +                    } else {
>> +                        cp = -1; /* invalid */
>>                       }
>> -                    cp <<= 4;
>> -                    cp |= hex2decimal(*ptr);
>> -                    ptr++;
>>                   }
>>                     if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf),
>> cp) < 0) {
>>                       parse_error(ctxt, token,
>> -                                "\\u%.4s is not a valid Unicode character",
>> -                                ptr - 3);
>> +                                "%.*s is not a valid Unicode character",
>> +                                (int)(ptr - beg), beg);
>
> The error reporting here has indeed been improved over v1.
>
> Reviewed-by: Eric Blake <address@hidden>

Thanks!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]