[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequ
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser |
Date: |
Mon, 13 Aug 2018 09:05:38 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Eric Blake <address@hidden> writes:
> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Both lexer and parser reject invalid escape sequences in strings. The
>> parser's check is useless.
>>
>
>>
>> Drop the lexer's escape sequence checking, and make it accept the same
>> characters after '\' it accepts elsewhere in strings. It now produces
>>
>> JSON_LCURLY {
>> JSON_STRING "address@hidden"
>> JSON_COLON :
>> JSON_INTEGER 1
>> JSON_RCURLY
>>
>> and the parser reports just
>>
>> JSON parse error, invalid escape sequence in string
>>
>> While there, fix parse_string()'s inaccurate function comment.
>
> Worthwhile improvement.
>
>>
>> Signed-off-by: Markus Armbruster <address@hidden>
>> ---
>> qobject/json-lexer.c | 72 +++----------------------------------------
>> qobject/json-parser.c | 56 +++++++++++++++++++--------------
>> 2 files changed, 37 insertions(+), 91 deletions(-)
>
> and shorter!
>
>> [IN_DQ_STRING_ESCAPE] = {
>> - ['b'] = IN_DQ_STRING,
>> - ['f'] = IN_DQ_STRING,
>> - ['n'] = IN_DQ_STRING,
>> - ['r'] = IN_DQ_STRING,
>> - ['t'] = IN_DQ_STRING,
>> - ['/'] = IN_DQ_STRING,
>> - ['\\'] = IN_DQ_STRING,
>> - ['\''] = IN_DQ_STRING,
>> - ['\"'] = IN_DQ_STRING,
>> - ['u'] = IN_DQ_UCODE0,
>> + [0x20 ... 0xFD] = IN_DQ_STRING,
>
> Among other things, this means the parser now has to flag "\u" as an
> incomplete escape - but your added testsuite coverage earlier in the
> series ensures that we do.
Yes.
>> +++ b/qobject/json-parser.c
>> @@ -106,30 +106,40 @@ static int hex2decimal(char ch)
>> }
>> /**
>> - * parse_string(): Parse a json string and return a QObject
>> + * parse_string(): Parse a JSON string
>> *
>> - * string
>
>> + * From RFC 7159 "The JavaScript Object Notation (JSON) Data
>> + * Interchange Format":
>> + *
>> + * char = unescaped /
>> + * escape (
>> + * %x22 / ; " quotation mark U+0022
>> + * %x5C / ; \ reverse solidus U+005C
>> + * %x2F / ; / solidus U+002F
>> + * %x62 / ; b backspace U+0008
>> + * %x66 / ; f form feed U+000C
>> + * %x6E / ; n line feed U+000A
>> + * %x72 / ; r carriage return U+000D
>> + * %x74 / ; t tab U+0009
>> + * %x75 4HEXDIG ) ; uXXXX U+XXXX
>> + * escape = %x5C ; \
>> + * quotation-mark = %x22 ; "
>> + * unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
>> + *
>> + * Extensions over RFC 7159:
>> + * - Extra escape sequence in strings:
>> + * 0x27 (apostrophe) is recognized after escape, too
>> + * - Single-quoted strings:
>> + * Like double-quoted strings, except they're delimited by %x27
>> + * (apostrophe) instead of %x22 (quotation mark), and can't contain
>> + * unescaped apostrophe, but can contain unescaped quotation mark.
>> + *
>> + * Note:
>> + * - Encoding is modified UTF-8.
>
> That is an extension over RFC 7159. But I'm okay with leaving it in
> the Notes section.
>
>> + * - Invalid Unicode characters are rejected.
>> + * - Control characters are rejected by the lexer.
>
> Worth being explicit that this is 00-1f, fe, and ff?
\xFE and \xFF are invalid, not control.
What about:
* - Invalid Unicode characters are rejected.
* - Control characters \x00..\x1F are rejected by the lexer.
[Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/08