qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lex


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error
Date: Mon, 13 Aug 2018 10:48:14 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
The JSON parser optionally supports interpolation.  The lexer
recognizes interpolation tokens unconditionally.  The parser rejects
them when interpolation is disabled, in parse_interpolation().
However, it neglects to set an error then, which can make
json_parser_parse() fail without setting an error.

Move the check for unwanted interpolation from the parser's
parse_interpolation() into the lexer's finite state machine.  When
interpolation is disabled, '%' is now handled like any other
unexpected character.

The next commit will improve how such lexical errors are handled.

Signed-off-by: Markus Armbruster <address@hidden>
---
  include/qapi/qmp/json-lexer.h |  4 ++--
  qobject/json-lexer.c          | 42 ++++++++++++++++++++++++++---------
  qobject/json-parser.c         |  4 ----
  qobject/json-streamer.c       |  2 +-
  tests/qmp-test.c              |  4 ++++
  5 files changed, 39 insertions(+), 17 deletions(-)



@@ -271,17 +272,38 @@ static const uint8_t json_lexer[][256] =  {
          [','] = JSON_COMMA,
          [':'] = JSON_COLON,
          ['a' ... 'z'] = IN_KEYWORD,
+        [' '] = IN_WHITESPACE,
+        ['\t'] = IN_WHITESPACE,
+        ['\r'] = IN_WHITESPACE,
+        ['\n'] = IN_WHITESPACE,
+    },
+
+    [IN_START_INTERPOL] = {
+        ['"'] = IN_DQ_STRING,
...

+        ['\n'] = IN_WHITESPACE,
+        /* matches IN_START up to here */
          ['%'] = IN_INTERPOL,

You could compress this as:

[IN_START_INTERPOL ... IN_START] = {
   ['"'] = ...
   ['\n'] = ...
},
[IN_START_INTERPOL]['%'] = IN_INTERPOL,

rather than duplicating the common list twice. (We already exploit gcc's range initialization, and the fact that you can initialize a broader range and then re-initialize a more specific subset later)

+++ b/tests/qmp-test.c
@@ -94,6 +94,10 @@ static void test_malformed(QTestState *qts)
/* lexical error: interpolation */
      qtest_qmp_send_raw(qts, "%%p\n");
+    /* two errors, one for "%", one for "p" */
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
+    qobject_unref(resp);
      resp = qtest_qmp_receive(qts);
      g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
      qobject_unref(resp);

I'm impressed at how easily you got the lexer to parse two different token grammars, and agree that doing it in the lexer when we don't want interpolation is a nicer place.

Reviewed-by: Eric Blake <address@hidden>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]