[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character bey
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F' |
Date: |
Tue, 28 Aug 2018 06:28:30 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Eric Blake <address@hidden> writes:
> On 08/27/2018 02:00 AM, Markus Armbruster wrote:
>> The lexer fails to end a valid token when the lookahead character is
>> beyond '\x7F'. For instance, input
>>
>> true\xC2\xA2
>>
>> produces the tokens
>>
>> JSON_ERROR true\xC2
>> JSON_ERROR \xA2
>>
>> The first token should be
>>
>> JSON_KEYWORD true
>>
>> instead.
>
> As long as we still get a JSON_ERROR in the end.
We do: one for \xC2, and one for \xA2. PATCH 4 will lose the second one.
>> The culprit is
>>
>> #define TERMINAL(state) [0 ... 0x7F] = (state)
>>
>> It leaves [0x80..0xFF] zero, i.e. IN_ERROR. Has always been broken.
>
> I wonder if that was done because it was assuming that valid input is
> only ASCII, and that any byte larger than 0x7f is invalid except
> within the context of a string.
Plausible thinko.
> But whatever the reason for the
> original bug, your fix makes sense.
>
>> Fix it to initialize the complete array.
>
> Worth testsuite coverage?
Since lookahead bytes > 0x7F are always a parse error, all the bug can
do is swallow a TERMINAL() token right before a parse error. The
TERMINAL() tokens are JSON_INTEGER, JSON_FLOAT, JSON_KEYWORD, JSON_SKIP,
JSON_INTERP. Fairly harmless. In particular, JSON objects get through
even when followed by a byte > 0x7F.
Of course, test coverage wouldn't hurt regardless.
>> Signed-off-by: Markus Armbruster <address@hidden>
>> ---
>> qobject/json-lexer.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>
> Reviewed-by: Eric Blake <address@hidden>
Thanks!
- [Qemu-devel] [PATCH 0/6] json: More fixes, error reporting improvements, cleanups, Markus Armbruster, 2018/08/27
- [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F', Markus Armbruster, 2018/08/27
- [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input", Markus Armbruster, 2018/08/27
- [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing, Markus Armbruster, 2018/08/27
- [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, Markus Armbruster, 2018/08/27
- [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Markus Armbruster, 2018/08/27