Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors

From:	Eric Blake
Subject:	Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors
Date:	Mon, 27 Aug 2018 12:18:42 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 08/27/2018 02:00 AM, Markus Armbruster wrote:

When the lexer chokes on an input character, it consumes the
character, emits a JSON error token, and enters its start state.  This
can lead to suboptimal error recovery.  For instance, input

     0123 ,

produces the tokens

     JSON_ERROR    01
     JSON_INTEGER  23
     JSON_COMMA    ,

Make the lexer skip characters after a lexical error until a
structural character ('[', ']', '{', '}', ':', ','), an ASCII control
character, or '\xFE', or '\xFF'.

Note that we must not skip ASCII control characters, '\xFE', '\xFF',
because those are documented to force the JSON parser into known-good
state, by docs/interop/qmp-spec.txt.

The lexer now produces

     JSON_ERROR    01
     JSON_COMMA    ,

So the lexer has now completely skipped the intermediate input, but theresulting error message need only point at the start of where input wentwrong, and skipping to a sane point results in fewer error tokens to bereported. Makes sense.


Update qmp-test for the nicer error recovery: QMP now report just one


s/report/reports/

error for input %p instead of two.  Also drop the newline after %p; it
was needed to tease out the second error.

That's because pre-patch, 'p' is one of the input characters thatrequires lookahead to determine if it forms a complete token (and thenewline provides the transition needed to consume it); now post-patch,the 'p' is consumed as part of the junk after the error is firstdetected at the '%'.

And to my earlier complaint about 0x1a resulting in JSON_ERROR thenJSON_INTEGER then JSON_KEYWORD, that sequence is likewise now identifiedas a single JSON_ERROR at the 'x', with the rest of the attempted hexnumber (invalid in JSON) silently skipped. Nice.


Signed-off-by: Markus Armbruster <address@hidden>
---
  qobject/json-lexer.c | 43 +++++++++++++++++++++++++++++--------------
  tests/qmp-test.c     |  6 +-----
  2 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 28582e17d9..39c7ce7adc 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -101,6 +101,7 @@

enum json_lexer_state {

      IN_ERROR = 0,               /* must really be 0, see json_lexer[] */
+    IN_RECOVERY,
      IN_DQ_STRING_ESCAPE,
      IN_DQ_STRING,
      IN_SQ_STRING_ESCAPE,
@@ -130,6 +131,28 @@ QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1);
  static const uint8_t json_lexer[][256] =  {
      /* Relies on default initialization to IN_ERROR! */

+ /* error recovery */

+    [IN_RECOVERY] = {
+        /*
+         * Skip characters until a structural character, an ASCII
+         * control character other than '\t', or impossible UTF-8
+         * bytes '\xFE', '\xFF'.  Structural characters and line
+         * endings are promising resynchronization points.  Clients
+         * may use the others to force the JSON parser into known-good
+         * state; see docs/interop/qmp-spec.txt.
+         */
+        [0 ... 0x1F] = IN_START | LOOKAHEAD,

And here's where the LOOKAHEAD bit has to be separate, because you arenow assigning semantics to the transition on '\0' that are distinct fromeither of the two roles previously enumerated as possible.

+        [0x20 ... 0xFD] = IN_RECOVERY,
+        [0xFE ... 0xFF] = IN_START | LOOKAHEAD,
+        ['\t'] = IN_RECOVERY,
+        ['['] = IN_START | LOOKAHEAD,
+        [']'] = IN_START | LOOKAHEAD,
+        ['{'] = IN_START | LOOKAHEAD,
+        ['}'] = IN_START | LOOKAHEAD,
+        [':'] = IN_START | LOOKAHEAD,
+        [','] = IN_START | LOOKAHEAD,
+    },

It took me a couple of reads before I was satisfied that everything isinitialized as desired (range assignments followed by more-specificre-assignment works, but isn't common), but this looks right.


Reviewed-by: Eric Blake <address@hidden>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 0/6] json: More fixes, error reporting improvements, cleanups, Markus Armbruster, 2018/08/27
- [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F', Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F', Eric Blake, 2018/08/27
    - Re: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F', Markus Armbruster, 2018/08/28
- [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input", Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input", Eric Blake, 2018/08/27
    - Re: [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input", Markus Armbruster, 2018/08/28
- [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing, Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing, Eric Blake, 2018/08/27
- [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, Eric Blake <=
    - Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, Markus Armbruster, 2018/08/28
- [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Eric Blake, 2018/08/27
  - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Eric Blake, 2018/08/27
    - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Markus Armbruster, 2018/08/28
    - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Eric Blake, 2018/08/28
    - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Eric Blake, 2018/08/28
    - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Markus Armbruster, 2018/08/31
    - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Markus Armbruster, 2018/08/31
- [Qemu-devel] [PATCH 6/6] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP, Markus Armbruster, 2018/08/27

Prev by Date: Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR
Next by Date: Re: [Qemu-devel] [Qemu-ppc] [PATCH] 40p: fix PCI interrupt routing
Previous by thread: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors
Next by thread: Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors
Index(es):
- Date
- Thread