[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Making QEMU easier for management tools and applications

From: Christophe de Dinechin
Subject: Re: Making QEMU easier for management tools and applications
Date: Sat, 25 Jan 2020 23:34:27 +0100

> On 23 Jan 2020, at 18:58, John Snow <address@hidden> wrote:
> On 1/23/20 2:19 AM, Markus Armbruster wrote:
>> John Snow <address@hidden> writes:
>>> On 12/24/19 8:41 AM, Daniel P. Berrangé wrote:
>>>>> * scripts/qmp/qmp-shell
>>>>>  Half-hearted attempt at a human-friendly wrapper around the JSON
>>>>>  syntax.  I have no use for this myself.
>>>> I use this fairly often as its a useful debugging / experimentation
>>>> / trouble shooting tool. There's similar ish functionality in
>>>> virsh qemu-monitor-command. I think there's scope of a supported
>>>> tool here that can talk to libvirt or a UNIX socket for doing
>>>> QMP commands, with a friendlier syntax & pretty printing. 
>>> qmp-shell is one of my go-to tools for working through bitmap workflows
>>> where we don't have convenience commands yet, as some of the setups
>>> required for fleecing et al involve quite a number of steps.
>>> I can copy-paste raw JSON into a socket, but personally I like seeing my
>>> commands neatly organized in a format where I can visually reduce them
>>> to their components at a glance.
>>> (What I mean is: It's hard to remember which QMP commands you've barfed
>>> into a terminal because JSON is hard to read and looks very visually
>>> repetitive.)
>>> I tried to rewrite qmp-shell late last year, actually. I wanted to write
>>> a new REPL that was json-aware in some manner such that you could write
>>> multi-line commands like this:
>>>> example-command arg={
>>>  "hello": "world"
>>> }
>>> This requires, sadly, a streamable JSON parser. Most JSON parsers built
>>> into Python as-is simply take a file pointer and consume the entirety of
>>> the rest of the stream -- they don't play very nice with incomplete
>>> input or input that may have trailing data, e.g.:
>>>> example-command arg={
>>>  "hello": "world"
>>> } arg2={
>>>  "oops!": "more json!"
>>> }
>> QMP is in the same boat: it needs to process input that isn't
>> necessarily full expressions (JSON-text in the RFC's grammar).
>> Any conventional parser can be made streaming by turning it into a
>> coroutine.  This is probably the simplest solution for handwritten
>> streaming LL parsers, because it permits recursive descent.  In Python,
>> I'd try a generator.
>> Our actual solution for QMP predates coroutine support in QEMU, and is
>> rather hamfisted:
>> * Streaming lexer: it gets fed characters one at a time, and when its
>>  state machine says "token complete", it feeds the token to the
>>  "streamer".
>> * "Streamer": gets fed tokens one at a time, buffers them up counting
>>  curly and square bracket nesting until the nesting is zero, then
>>  passes the buffered tokens to the parser.
>> * Non-streaming parser: it gets fed a sequence of tokens that constitute
>>  a full expression.
>> The best I can say about this is that it works.  The streamer's token
>> buffer eats a lot of memory compared to a real streaming parser, but in
>> practice, it's a drop in the bucket.
> I looked into this at one point. I forget why I didn't like it. I had
> some notion that I should replace this one too, but forget exactly why.
> Maybe it wasn't that bad, if I've forgotten.
>>> Also, due to the nature of JSON as being a single discrete object and
>>> never a stream of objects, no existing JSON parser really supports the
>>> idea of ever seeing more than one object per buffer.
>> That plainly sucks.
>>> ...So I investigated writing a proper grammar for qmp-shell.
>> Any parser must start with a proper grammar.  If it doesn't, it's a toy,
>> or a highway to madness.
>>> Unfortunately, this basically means including the JSON grammar as a
>>> subset of the shell grammar and writing your own parser for it entirely.
>> Because qmp-shell is a half-hearted wrapper: we ran out of wrapping
>> paper, so JSON sticks out left and right.
>> Scrap and start over.
>>> I looked into using Python's own lexer; but it's designed to lex
>>> *python*, not *json*. I got a prototype lexer working for this purpose
>>> under a grammar that I think reflects JSON, but I got that sinking
>>> feeling that it was all more trouble than it was worth, and scrapped
>>> working on it any further.
>> Parsing JSON is pretty simple.  Data point: QAPISchemaParser parses our
>> weird derivative of JSON in 239 SLOC.
>>> I did not find any other flex/yacc-like tools that seemed properly
>>> idiomatic or otherwise heavily specialized. I gave up on the idea of
>>> writing a new parser.
>> While I recommend use of tools for parsing non-trivial grammars (you'll
>> screw up, they won't), they're massive overkill for JSON.
>>> I'd love to offer a nice robust QMP shell that is available for use by
>>> end users, but the syntax of the shell will need some major considerations.
>> Scrap and start over.
>> [...]
> Yes, I agree: Scrap and start over.
> What SHOULD the syntax look like, though? Clearly the idea of qmp-shell
> is that it offers a convenient way to enter the top-level keys of the
> arguments dict. This works absolutely fine right up until you need to
> start providing nested definitions.

Well, if you are really ready to start from scratch, I might offer the XL syntax
as a starting point for a discussion of a user-visible syntax that is also
applicable for text-based or binary API exchanges.

I’m going to talk about it at FOSDEM in the “minimalist languages” design.
Those who are in Brussels might want to attend to get a better feel.
Source code is here: https://github.com/c3d/xl, but the only part you
care about for this discussion is src/{parser,scanner}.{c,h} and the
syntax configuration file src/xl.syntax. As well as renderer styles
src/xl.stylesheet, src/html.stylesheet, etc.

Key points for the use case considered:
- Tiny (~2000 lines of code for parser/scanner, a C and a C++ implementation)
- Fully introspectable, serializable in a cross-platform way, printable (with 
- Character-precise position tracking for error printing
- Parser preserves comments (for documentation generators)
- Small, if slow, interpreter in about 20K lines of code (~bash speed on some 
  meaning we would get a “qemu scripting language” with loops, tests, 
arithmetic, etc.

More detailed discussion at end of this mail if you think it warrants a second 
In any case, if it helps, I’d be happy to help connecting it to qemu…

> For the nesting, we say: "Go ahead and use JSON, but you have to take
> all the spaces out."

Here, that would be A.B.C, which parses as

(result of `xl -nobuiltins -parse test.xl -style debug -show`)

Also, an example given earlier:

 { 'command': 'iothread-set-poll-params',
   'data': {
       'id': 'str',
        '*max-ns': 'uint64',
        '*grow': 'uint64',
        '*shrink': 'uint64'
   'map-to-qom-set': 'IOThread'

could be written as:

command iothread_set_poll_params
                id : str
                *max_ns : uint64
                *grow : uint64
                *shrink : uint64
        map_to_qom_set IOThread

But if you want to keep the original syntax, it seems to parse and render 
practically OK:

% cat /tmp/a.xl 
{ 'command': 'iothread-set-poll-params',
   'data': {
       'id': 'str',
        '*max-ns': 'uint64',
        '*grow': 'uint64',
        '*shrink': 'uint64'
   'map-to-qom-set': 'IOThread'

%xl -nobuiltins -parse /tmp/a.xl  -show            
{ 'command':'iothread-set-poll-params', 'data': { 'id':'str', 
    '*max-ns':'uint64', '*grow':'uint64', '*shrink':'uint64'
}, 'map-to-qom-set':'IOThread' }

This is with no change to the XL parser / scanner code
whatsoever, not even to the syntax file. So that gives me hope
that we could have a “reasonably good” compatibility mode
that transforms the quasi-JSON format into the new form,
with a single parser accepting both.

> This... works, charitably, but is hardly what I would call usable.
> For the CLI, we offer a dot syntax notation that resembles nothing in
> particular. It often seems the case that it isn't expressive enough to
> map losslessly to JSON. I suspect it doesn't handle siblings very well.
> A proper HMP-esque TUI would likely have need of coming up with its own
> pet syntax for commands that avoid complicated nested JSON definitions,
> but for effort:value ratio, having a QMP shorthand shell that works
> arbitrarily with any command might be a better win.

The XL proposal here would be to have a single format shared by
- The source definitions used to generate C code
- The monitor / internal shell syntax
- The command-line syntax
- The API data (possibly in serialized form for compactness)

> Do we still have a general-case problem of how to represent QAPI
> structures in plaintext? Will this need to be solved for the CLI, too?
> --js

More info below.

Here are some aspects that I think are interesting about it:

- Tiny (2000 lines of code for scanner and parser, ~20K for a full interpreter)
C:  wc parser.c parser.h scanner.c scanner.h 
     716    2183   26702 parser.c
     100     440    3372 parser.h
     926    2966   30537 scanner.c
     206     945    8249 scanner.h
    1948    6534   68860 total

     726    2372   26918 parser.cpp
     885    2480   26363 scanner.cpp
     248    1025    8867 ../include/scanner.h
     166     687    5958 ../include/parser.h
    2025    6564   68106 total

- Simple (parse tree with 8 node types, integer, real, name/symbol, text, 
infix, prefix, postfix and block)

        + integer, e.g. 12, 1_000_000 or 16#33A or 2#10101
        + real, e.g. 11.3, 16#1.FFF#e-3, 2#1.01
        + text, e.g. “ABC”, ‘ABC’, <<Long text, multi-lines>> (configurable 
        + name/symbols, a.g. Foo_Bar, +, <=, (precedence and spelling 

        + infix, e.g. A+B, A and B
        + prefix, e.g. +3, sin X
        + postfix, e.g. 3%, 3 km
        + block, e.g. [A], (A), {A} and indentation blocks

- Fully introspectable (mostly because the parse tree is simple)
- Reversible, i.e. can be printed, including with formatting, e.g.:
        % xl -nobuiltins -parse demo/1-hello.xl -show
        tell "localhost”, 
            print "Hello World”
        % xl -nobuiltins -parse demo/1-hello.xl -style debug -show   
          (block indent
            "Hello World”
- Also has a binary serializer that produces a platform-independent format
- Has multiple implementations, notably C and C++ implementation (and even one 
in XL :-)
- Validated on thousands of lines of input, with various language styles (e.g. 
Ada-like or functional)
- Character-level position tracking for error messages in scripts / config 
        /tmp/xl.xl:1007:8: Mismatched identation, expected “)"
        /tmp/xl.xl:2409:23: Mismatched identation, expected ""
- Designed to be easy to read and write
- Powerful enough to parse itself 
- Dynamically configurable syntax (spelling and precedence of operators)
- Multi-line text with configurable separators, e.g. the following can be made 
a text constant by having XML and END_XML as text separators:
            Insert your XML here
- Based-numbers in any base, e.g. 8#777, 16#FFFF_FFFF and 2#1.001 as valid 
- Has essentially a single contributor (me), so easy to relicense as needed
- There is an interpreter, e.g. potential evaluate expressions like 2+3*A
- Relatively fast (6.1s to parse 1M lines of code representing 40M of code, cpp 
% wc /tmp/tmp.xl
 1000000 3893922 41679700 /tmp/tmp.xl
% time xl -parse /tmp/tmp.xl
 6.10s user 0.21s system 99% cpu 6.346 total
- Support multiple styles, e.g. using { } for blocks or indentation, 
parentheses or not, etc.

Cons (but I’m not the better person to come up with cons on this pet project of 
mine ;-):
- Idiosyncratic
- Single contributor
- Not well maintained
- Definitely not production quality (even the makefiles are broken ;-)
- Has some CI testing, but it fails, and it’s totally insufficient
- Interpreter far from perfect
- Designed with another purpose in mind (a programming language)
- Syntax is not C-centric, e.g. 16#FFFF instead of 0xFFFF.
- Name syntax does not allow -, i.e. max-ns is “max minus ns”, max_ns OK.
- [insert probably about a thousand others here]

Precedences and other stuff can be configured dynamically, through a file
in the current implementations, eg. 
So that means we can have a “nice” syntax for the commands and objects,
and a format that can serve both as a config file format, as a command
language, and as a full shell-style language with if, loops, etc.

It also supports nested syntaxes, i.e. dynamic changes of precedence
between selected separators. Used to support simplified C syntax
with “extern int foo();”, where the “C” syntax is active between “extern”
and “;”. Could be useful for compatibility.

Parse tree is simple enough that it’s fully introspectable.

There is a (configurable) renderer, so you can generate source from the
internal data structure. The renderer can generate colorized source
code in HTML, so I guess we could generate C data structures relatively

I believe that it is relatively trivial to configure the parser syntax file 
to accept the QEMU quasi-JSON. (some code changes required to teach
it to totally ignore whitespace, toi avoid error messages).

More complete documentation about the language is here:
https://c3d.github.io/xl, but it’s quite light on implementation details.
So read only if you have a bit of time.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]