qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 19/19] qapi: New QMP command query-schema fo


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH RFC 19/19] qapi: New QMP command query-schema for QMP schema introspection
Date: Wed, 29 Apr 2015 10:46:59 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Eric Blake <address@hidden> writes:

> On 04/02/2015 11:29 AM, Markus Armbruster wrote:
>> Caution, rough edges.
>> 
>> qapi/introspect.json defines the introspection schema.  It should do
>> for uses other than QMP.
>
> That is, QGA should also be able to reuse it for introspection.
>
>> FIXME it's almost entirely devoid of comments.
>
> Yeah, but it's only RFC quality, and if we alter design you don't have
> to worry about lengthy comments to keep in sync :)  Of course, the final
> version ought to have good comments.
>
>> 
>> The introspection schema does not reflect all the rules and
>> restrictions that apply to QAPI schemata.  A valid QAPI schema has an
>> introspection value conforming to the introspection schema, but the
>> converse is not true.
>
> I can live with that.  Introspection is output-only (we aren't creating
> new runtime commands by using introspection descriptions as input, so it
> doesn't matter if the introspection schema permits more than qapi will
> ever generate).
>
>> 
>> Introspection lowers away a number of schema details:
>> 
>> * The builtin types are declared with their JSON type.
>> 
>>   TODO should we map all the integer types to just int?
>
> Or even have introspection output two things: type ('int' for all JSON
> integers, regardless of range) and range ([0,255] for 'uint8').  I could
> live with that (ideally, knowing the range will help libvirt avoid
> passing in too-large-a-value to a parameter it introspected; but right
> now the idea is that most of libvirt's introspection will be "does
> something exist" not "what range does it support", and that libvirt will
> already have hard-coded knowledge of "if it exists, it is a uint8"
> without having to ask introspection for the type libvirt will supply).

I'm reluctant to add range information without a clear use.

Tight range information based on declared types requires range types,
and a lot of discipline using them.

>> * Implicit type definitions are made explicit, and given
>>   auto-generated names.  These names start with ':' so they don't
>>   clash with the user's names.
>
> Hopefully that works.  Although having to parse for ':' is an argument
> that we are packing multiple pieces of information in one JSON
> name/value, which sometimes means we should have instead supplied two
> different name/values for the two different pieces of information. Oh
> well, I'll see more when I get into the schema part.

Design assumption: users don't care whether something was explicitly
named by a human.

Corollary: users are not expected to check for ':'.

> Maybe we could also tweak the generator to put implicit names in the
> leading '_' namespace (we already outlaw use of leading '_' in all but
> downstream names, and require downstream names to use double '__', so we
> could easily identify '_generated' and '___generated' as internal vs.
> 'regular' and '__downstream' as user-supplied; it would free up current
> places where the qapi user cannot supply certain names. But not for this
> series).

Workable if we enforce "no explicit name can start with a single '_'".
I picked ':' for the RFC to simplify my correctness argument.

>>   Example: a simple union implicitly defines an enumeration type for
>>   its discriminator.
>> 
>> * All type references are by name.
>
> Fine if the type is named; but what happens if the type is anonymous
> inline (as in many commands?)

The system generates a type with a made-up name.  Bye, bye, special
case :)

>                                Also, in my nested struct cleanups, I
> found several places where it would be nice to patch the generator to
> support more anonymous inline types, such as in:
>
> { 'enum': 'MyEnum', 'data': [ 'one', 'two' ] }
> { 'union': 'Foo', 'base': { 'switch': 'MyEnum', 'name': 'str' },
>   'discriminator': 'switch', data': {
>     'one': { 'value': 'int' },
>     'two': { 'default': 'int' } } }
>
> It would even work for our long-hand of optional arguments:
> { 'command': 'foo', 'data': {
>   'bar': { 'type': { 'name': 'str', 'value': 'int' },
>            'optional': true } } }
>
> I guess we can always go with anonymous inline types resulting in an
> implicit named type creation, and refer to that name, if cramming an
> anonymous type into the schema is too hard.

Supporting inline types in the schema is fine with me, as long as it
doesn't get in the way of extensibility.

Example of inline types getting in the way: nested structs preclude
clean extension of member declarations for additional properties.
That's why you're removing them.

In introspection context, I view inline vs. explicit type as irrelevant
detail.  Introspection examines the actual structure of things, not how
the schema author chose to write it up.

>> * Base types are flattened.
>
> That's fine - we are introspecting what can be sent on the wire, and
> don't care what types were used in the qapi to arrive at that point.

Exactly.  Same argument as for inline vs. explicit type.

>> * The top type (named '**') can hold any value.
>> 
>>   It can currently occur only in commands, but the introspection
>>   schema doesn't reflect that.
>> 
>> * The struct and union types are generalized into an object type.
>
> I think you mentioned ideas on that in reviewing my v5 thread; they
> sounded reasonable at the time, so we'll see when I actually review the
> schema.
>
>> 
>> * Commands take a single argument and return a single result.
>> 
>>   Dictionary argument/result or list result is an implicit type
>>   definition.
>> 
>>   Missing argument/result is an implicit definition of the empty
>>   object.
>> 
>>   The argument is always of object type, but the introspection schema
>>   doesn't reflect that.
>> 
>>   The 'gen': false directive is omitted as implementation detail.
>
> Yep, that's fine. QAPI drives internal details that don't affect the
> wire format, and it's fine that introspection doesn't have to expose them.
>
>> 
>> * Events carry a single data value.
>> 
>>   Implicit type definition as above.
>> 
>>   The value is of object type, but the introspection schema doesn't
>>   reflect that.
>> 
>> * Types not used by commands or events are omitted.
>> 
>>   Indirect use counts as use.
>> 
>> * Optional members have a default, which can only be null right now
>
> Even for integral members?

Yes, because the schema doesn't support default values, yet.

The obvious introspection solution for the current state of things would
be a flag "optional", present and true when a member is optional.

My solution tries to provide for a future schema extension by default
values: instead of flag "optional", have "default", present and null
when a member is optional.  Value changes from null to the actual
default value when we add default values.

Thus, "'default': null" means "schema declares no default value".  The
meaning of omitting the member is unspecified in the schema then.

>>   Instead of a mandatory "optional" flag, we have an optionial
>
> s/optionial/optional/
>
>>   default.  No default means mandatory, default null means optional
>>   without default value.  Non-null is available for optional with
>>   default.
>> 
>> TODO much of the above should go into docs.
>> 
>> New generator scripts/qapi-introspect.py computes an introspection
>> value for its input, and generates a C variable holding it.
>> Pythonistas may find it ugly and/or clumsy.
>
> I'm not one, so my review will (happily?) paper over the clumsy usage.
>
>> 
>> FIXME it can generate awfully long lines
>
> Not the first of our generators with that problem.  But even tougher if
> you are generating C strings for highly-complex types.

Perhaps we can have a general long-line-breaker to take care of the
problem everywhere.  If long lines bother us.

>> The schema generation proper is pretty trivial.  Much of the
>> qapi-introspect.py's code actually does something else: convert the
>> output of parse_schema() into a more usable data structure, lowering
>> away uninteresting stuff.  This should really be done in qapi.py, so
>> the other generators can use it, too.
>
> And we may want to start by breaking that out into separate commits.

Yup.  I expect that to be at least as much work as creating this first
monolithic version.

>> For testing, I feed it tests/qapi-schema/qapi-schema-test.json, but no
>> more.
>
> That's actually a good test - it tries to cover as much of our generator
> as possible, without too many repetitions of types of objects.
>
>> 
>> New QMP command query-schema parses its return value from that
>> variable.  Its reply is less than 75KiBytes right now.
>
> A bit heavy; implies that we might want it to support optional
> filtering. But that can be added on top.

Clients should query-schema just once.

The value of query-schema is everything reachable from commands and
events.  An obvious filter would be everything reachable from commands
and events the client actually needs.  I doubt that'll reduce the size
all that much.  It'll increase complexity, though.  Got better filtering
ideas?

Here's another way to reduce I/O.  Observe that query-schema's value can
change only when you replace the QEMU binary.  Have a way to query the
hash of the value.  Clients can then cache query-schema's value, and
query-schema only when the hash doesn't match a cached value.

>> A compile time test that the value can be parsed would be nice.
>> 
>> Signed-off-by: Markus Armbruster <address@hidden>
>> ---
>>  .gitignore                 |   1 +
>>  Makefile                   |   9 +-
>>  Makefile.objs              |   1 +
>>  monitor.c                  |   8 +
>>  qapi-schema.json           |   3 +
>>  qapi/introspect.json       |  72 ++++++++
>>  qmp-commands.hx            |  16 ++
>>  scripts/qapi-introspect.py | 430
>> +++++++++++++++++++++++++++++++++++++++++++++
>>  tests/.gitignore           |   1 +
>>  tests/Makefile             |   8 +-
>
> No docs/qapi-code-gen.txt changes? (Oh right, you said so, for the RFC...)
>
>>  10 files changed, 547 insertions(+), 2 deletions(-)
>>  create mode 100644 qapi/introspect.json
>>  create mode 100644 scripts/qapi-introspect.py
>
> Should it be 755?  Oh, none of the other qapi*.py are.

They arguably should be, except for qapi.py.

>> +++ b/monitor.c
>> @@ -73,6 +73,7 @@
>>  #include "block/qapi.h"
>>  #include "qapi/qmp-event.h"
>>  #include "qapi-event.h"
>> +#include "qmp-introspect.h"
>>  #include "sysemu/block-backend.h"
>>  
>>  /* for hmp_info_irq/pic */
>> @@ -997,6 +998,13 @@ EventInfoList *qmp_query_events(Error **errp)
>>      return ev_list;
>>  }
>>  
>> +static int qmp_query_schema(Monitor *mon, const QDict *qdict,
>> +                            QObject **ret_data)
>> +{
>> +    *ret_data = qobject_from_json(qmp_schema_json);
>> +    return 0;
>
> Deceptively simple - it means everything is known at compile-time (no
> run-time filtering for conditionally implemented commands; might become
> important later on for listing only whitelisted QGA commands, for example?)

Stupidest solution that could possibly work, i.e. the one you should
always try first :)

If we need to mess with it, we probably want to parse the sucker with
visitors into the generated C data structures.  Could also be useful to
verify the value actually conforms to the introspection schema.

>> +}
>> +
>>  /* set the current CPU defined by the user */
>>  int monitor_set_cpu(int cpu_index)
>>  {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 6b280b7..0efeb80 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -14,6 +14,9 @@
>>  # Tracing commands
>>  { 'include': 'qapi/trace.json' }
>>  
>> +# QAPI introspection
>> +{ 'include': 'qapi/introspect.json' }
>> +
>>  ##
>>  # LostTickPolicy:
>>  #
>> diff --git a/qapi/introspect.json b/qapi/introspect.json
>> new file mode 100644
>> index 0000000..87de856
>> --- /dev/null
>> +++ b/qapi/introspect.json
>> @@ -0,0 +1,72 @@
>> +# -*- Mode: Python -*-
>> +#
>> +# QAPI introspection
>> +#
>> +# Copyright (C) 2015 Red Hat, Inc.
>> +#
>> +# Authors:
>> +#  Markus Armbruster <address@hidden>
>> +#
>> +# This work is licensed under the terms of the GNU GPL, version 2 or later.
>> +# See the COPYING file in the top-level directory.
>> +
>> +{ 'enum': 'SchemaMetaType',
>> +  'data': [ 'builtin', 'enum', 'array', 'object', 'alternate',
>> +            'command', 'event' ] }
>
> Yep, definitely depends on my work going in first.
>
> 'builtin' is not an English word; do we want to go with the longer
> "built-in", or is it not worth it?

Matter of taste.  Whatever is more consistent with usage elsewhere gets
my vote.

>> +
>> +{ 'type': 'SchemaInfoBase',
>> +  'data': { 'name': 'str', 'meta-type': 'SchemaMetaType' } }
>> +
>> +{ 'enum': 'JSONPrimitiveType',
>> +  'data': [ 'str', 'int', 'number', 'bool', 'null' ] }
>> +
>> +{ 'type': 'SchemaInfoBuiltin',
>> +  'data': { 'json-type': 'JSONPrimitiveType' } }
>> +
>> +{ 'type': 'SchemaInfoEnum',
>> +  'data': { 'values': ['str'] } }
>
> At one point, one thread suggested that we might want to allow QAPI to
> support enums with fixed values, as in:
>
> 'data': [ {'one': 1}, {'three': 3} ]
>
> I guess returning an array of 'str' for now is safe; because we could
> always increase introspection to return an alternate between 'str' and a
> formal struct down the road if the user needs to know about enums with
> values that are guaranteed (vs. the usual enum that is name-association
> only with no guaranteed value).

The numeric value is an implementation detail, as Kevin pointed out.

>> +
>> +{ 'type': 'SchemaInfoArray',
>> +  'data': { 'element-type': 'str' } }
>
> We currently don't allow array of anonymous types, but looks like your
> implicit type naming would cover that.

Yup.  Treating anonymous types that way avoids special restrictions.

>> +
>> +{ 'alternate': 'Value',
>> + 'data': { 'int': 'int', 'number': 'number', 'str': 'str', 'bool':
>> bool' } }
>
> Nothing for objects?...

What type could I use there?  Closest we have is '**', but that doesn't
work here.

Unless we decide not to support object defaults, we better figure out
how we want to do them here.  We don't have to actually implement them
right away, of course.

>> +
>> +{ 'type': 'SchemaInfoObjectMember',
>> +  'data': { 'name': 'str', 'type': 'str', '*default': 'Value' } }
>> +# @default's type must match @type
>> +# can only default simple types, not objects or arrays
>
> but then again, that's what you said, in a rare comment.
>
>> +
>> +{ 'type': 'SchemaInfoObjectVariant',
>> +  'data': { 'case': 'str',
>> +            'members': [ 'SchemaInfoObjectMember' ] } }
>> +
>> +{ 'type': 'SchemaInfoObject',
>> +  'data': { 'members': [ 'SchemaInfoObjectMember' ],
>> +            '*discriminator': 'str',
>> +            '*variants': [ 'SchemaInfoObjectVariant' ] } }
>
> I guess since the discriminator is always a JSON string, that the 'str'
> here is the type name ('str' or an enum name)?  Or is it the
> discriminator member name ('type' on simple unions, whatever
> 'discriminator' was on flat unions)?  We probably want BOTH pieces of
> information, always, particularly if I finish my work on extending
> simple unions to have an enum-type discriminator.  On the other hand,
> the 'variants' array lists all of the branches, which means I can
> reconstruct the enum values (or even the open-coded 'str' values)
> without caring whether the discriminator was typed.
>
> So after all that, I guess you are using it as the discriminator member
> name and NOT its type.

Correct.

>> +
>> +{ 'type': 'SchemaInfoAlternate',
>> +  'data': { 'members': [ 'SchemaInfoObjectMember' ] } }
>> +
>> +{ 'type': 'SchemaInfoCommand',
>> +  'data': { 'args': 'str', 'returns': 'str' } }
>
> We could always use an alternate to return either a 'str' or a
> 'SchemaInfoObject' to represent anonymous inline types without having to
> go through an implicit generated name.  If that makes any more sense.

Yes, but I rather like the simplicity of having only named types.

>> +
>> +{ 'type': 'SchemaInfoEvent',
>> +  'data': { 'data': 'str' } }
>
> Likewise.
>
>> +
>> +{ 'union': 'SchemaInfo',
>> +  'base': 'SchemaInfoBase',
>> +  'discriminator': 'meta-type',
>> +  'data': {
>> +      'builtin': 'SchemaInfoBuiltin',
>> +      'enum': 'SchemaInfoEnum',
>> +      'array': 'SchemaInfoArray',
>> +      'object': 'SchemaInfoObject',
>> +      'alternate': 'SchemaInfoAlternate',
>> +      'command': 'SchemaInfoCommand',
>> +      'event': 'SchemaInfoEvent' } }
>> +
>> +{ 'command': 'query-schema',
>> +  'returns': [ 'SchemaInfo' ],
>> +  'gen': false }
>
> Overall, looks pretty nice.  But if we reuse this file for both QMP and
> QGA, then maybe the 'command' should be in the parent file that includes
> this (so that this file is just everything through 'SchemaInfo' to
> declare the types, but leaves QGA free to name its command different
> than QMP, if desired)

Working assumption: QGA is syntactically just like QMP.  Michael, can
you think of a reason why QGA could require introspection differences?

>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index 7f68760..27ab209 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -2039,6 +2039,22 @@ EQMP
>>      },
>>  
>>  SQMP
>> +query-schema
>> +------------
>> +
>> +Return the QMP schema.
>> +
>> +FIXME explain
>> +
>> +EQMP
>> +
>> +    {
>> +        .name       = "query-schema",
>> +        .args_type  = "",
>> +        .mhandler.cmd_new = qmp_query_schema,
>> +    },
>> +
>> +SQMP
>>  query-chardev
>>  -------------
>>  
>> diff --git a/scripts/qapi-introspect.py b/scripts/qapi-introspect.py
>> new file mode 100644
>> index 0000000..c09276b
>> --- /dev/null
>> +++ b/scripts/qapi-introspect.py
>
> I stopped reviewing here for today.

Thanks!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]