[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR |
Date: |
Fri, 31 Aug 2018 09:06:03 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Eric Blake <address@hidden> writes:
> On 08/27/2018 11:40 PM, Markus Armbruster wrote:
>
>>>> typedef enum json_token_type {
>>>> - JSON_MIN = 100,
>>>> - JSON_LCURLY = JSON_MIN,
>>>> + JSON_ERROR = 0, /* must be zero, see json_lexer[] */
>>>> + /* Gap for lexer states */
>>>> + JSON_LCURLY = 100,
>>>> + JSON_MIN = JSON_LCURLY,
>>>
>>> In an earlier version of this type of cleanup, you swapped the IN_ and
>>> JSON_ values and eliminated the gap, to make the overall table more
>>> compact (no storage wasted on any of the states in the gap between the
>>> two).
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg01178.html
>>>
>>> Is it still worth trying to minimize the gap between the two
>>> sequences, even if you now no longer swap them in order?
>>
>> You caught me :)
>>
>> Eliminating the gap actually enlarges the table.
>
> Rather, switching the order enlarges the table.
>
>> I first got confused,
>> then measured the size change backwards to confirm my confused ideas.
>> When I looked at the patch again, I realized my mistake, and silently
>> dropped this part of the change.
>
> The size of the table is determined by the fact that we must
> initialize entry 0 (whether we spell it IN_ERROR or JSON_ERROR), then
> pay attention to the largest value assigned. Re-reading json_lexer[],
> you are only initializing IN_* states, and not JSON_* states;
Correct.
The JSON_* states other than JSON_ERROR all go to the start state
regardless of lookahead and without consuming it. We implement that
state transition in code instead of putting it into the table:
case JSON_STRING:
json_message_process_token(lexer, lexer->token, new_state,
lexer->x, lexer->y);
/* fall through */
case JSON_SKIP:
g_string_truncate(lexer->token, 0);
/* fall through */
case IN_START:
--> new_state = lexer->start_state;
break;
JSON_ERROR goes to IN_RECOVERY instead:
case JSON_ERROR:
json_message_process_token(lexer, lexer->token, JSON_ERROR,
lexer->x, lexer->y);
---> new_state = IN_RECOVERY;
/* fall through */
case IN_RECOVERY:
> swapping
> JSON_* to come first enlarged the table because you now have a bunch
> of additional rows in the table that are all 0-initialized to
> JSON_ERROR transitions.
Yes. These rows are never used.
> So at the end of the day, leaving IN_* to be first, and putting JSON_*
> second, makes sense.
>
> The question remains, then, if a fixed-size gap (by making JSON_MIN be
> exactly 100) is any smarter than a contiguous layout (by making
> JSON_MIN be IN_START_INTERP + 1). I can't see any strong reason for
> preferring one form over the other, so keeping the gap doesn't hurt.
The gap lets us hide the IN_* in json-lexer.c. Not sure it's worth the
trouble. We could move the IN_* to json-parser-int.h for simplicity.
Not sure that's worth the trouble, either :)
- [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, (continued)
[Qemu-devel] [PATCH 6/6] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP, Markus Armbruster, 2018/08/27