Re: [PATCH v2 1/3] qmp: Support for querying stats

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/3] qmp: Support for querying stats

From:	Daniel P . Berrangé
Subject:	Re: [PATCH v2 1/3] qmp: Support for querying stats
Date:	Tue, 7 Dec 2021 18:42:53 +0000
User-agent:	Mutt/2.1.3 (2021-09-10)
Copying in Markus as QAPI maintainer, since I feel this proposed
design is a little oddball from typical QAPI design approach....

On Fri, Nov 19, 2021 at 01:51:51PM -0600, Mark Kanda wrote:
> Introduce qmp support for querying stats. Provide a framework for
> adding new stats and support for the following commands:
> 
> - query-stats
> Returns a list of all stats, with options for specifying a stat
> name and schema type. A schema type is the set of stats associated
> with a given component (e.g. vm or vcpu).
> 
> - query-stats-schemas
> Returns a list of stats included in each schema type, with an
> option for specifying the schema name.
> 
> - query-stats-instances
> Returns a list of stat instances and their associated schema type.
> 
> The framework provides a method to register callbacks for these qmp
> commands.
> 
> The first usecase will be for fd-based KVM stats (in an upcoming
> patch).
> 
> Examples (with fd-based KVM stats):
> 
> { "execute": "query-stats" }
> { "return": [
>     { "name": "vcpu_1",
>       "type": "kvm-vcpu",
>       "stats": [
>         { "name": "guest_mode",
>           "unit": "none",
>           "base": 10,
>           "val": [ 0 ],
>           "exponent": 0,
>           "type": "instant" },
>         { "name": "directed_yield_successful",
>           "unit": "none",
>           "base": 10,
>           "val": [ 0 ],
>           "exponent": 0,
>           "type": "cumulative" },
> ...
>     },
>     { "name": "vcpu_0",
>       "type": "kvm-vcpu",
>       "stats": ...
> ...
>  },
>     { "name": "vm",
>       "type": "kvm-vm",
>       "stats": [
>         { "name": "max_mmu_page_hash_collisions",
>           "unit": "none",
>           "base": 10,
>           "val": [ 0 ],
>           "exponent": 0,
>           "type": "peak" },
>           ...

So this is essentially exposing the low level kernel data structure
'struct kvm_stats_desc' mapped 1-to-1 into QAPI.

There are pros/cons to doing that should be explored to see whether
this actually makes sense for the QMP design.

I understand this design is intended to be fully self-describing
such that we can add arbitrarily more fields without ever
changing QEMU code, and with a simple mapping from the kernel
kvm_stats_desc.

Taking the first level of data returned, we see the natural
structure of the data wrt vCPUs is flattened:

 { "return": [
     { "name": "vcpu_0",
       "type": "kvm-vcpu",
       "stats": [...],  // stats for vcpu 0
     },
     { "name": "vcpu_1",
       "type": "kvm-vcpu",
       "stats": [...],  // stats for vcpu 0
     },
     ...other vCPUs...
     { "name": "vm",
       "type": "kvm-vm",
       "stats": [...],  // stats for the VM
     },
 }

This name+type stuff is all unnecessarily indirect. If we ever
have to add more data unrelated to the kvm stats, we're going
to need QEMU changes no matter what, so this indirect structure
isn't future proofing it.

I'd rather expect us to have named struct fields for each
different provider of data, respecting the natural hierachy.
ie use an array for vCPU data.

I understand this is future proofed to be able to support
non-KVM stats. If we have KVM per-vCPU stat and non-KVM
per-VCPU stats at the same time, I'd expect them all to
appear in the same place.  IOW, overall I'd expect to
see grouping more like

 { "return": {
     "vcpus": [
        
        [ // stats for vcpu 0
          { "provider": 'kvm',
            "stats": [...] },
          { "provider": 'qemu',
            "stats"; [...] }
        ],
        
        [ // stats for vcpu 1
          { "provider": 'kvm',
            "stats": [...] },
          { "provider": 'qemu',
            "stats"; [...] }
        ],
        
        ...other vCPUs...
     ]
     "vm": [
         { "provider": 'kvm',
           "stats": [...] },
         { "provider": 'qemu',
           "stats"; [...] } ],
     ],
 }


Now onto the values being reported. AFAICT from the kernel
docs, for all the types of data it currently reports
(cumulative, instant, peak, none), there is only ever going
to be a single value. I assume the ability to report multiple
values is future proofing for a later requirement. This is
fine from a kerenl POV since they're trying to fit this into
a flat struct. QAPI is way more flexible. It can switch
between reporting an scale or array or scalars for the
same field. So if we know the stat will only ever have
1 value, we should be reporting a scalar, not an array
which will only ever have one value.

Second, for a given named statistic, AFAICT, the data type,
unit, base and exponent are all fixed. I don't see a reason
for us to be reporting that information every time we call
'query-stats'. Just report the name + value(s).  Apps that
want a specific named stat won't care about the dimensions,
because they'll already know what the value means.

Apps that want to be metadata driven to handle arbitrary
stats, can just call 'query-stats-schemas' to learn about
the dimensions one time.

This will give waaay lower data transfer for querying
values repeatedly.

> 
> { "execute": "query-stats-schemas" }
> { "return": [
>     { "type": "kvm-vcpu",
>       "stats": [
>         { "name": "guest_mode" },
>         { "name": "directed_yield_successful" },
>         ...
>         },
>     { "type": "kvm-vm",
>       "stats": [
>         { "name": "max_mmu_page_hash_collisions" },
>         { "name": "max_mmu_rmap_size" },
>         ...

...this can be used to introspect the data type, unit,
base, exponent as a one time task, if needed.

Again, I'd expect the first level of nested to be
removed  to mirror 'query-stats',

 { "return": {
     "vcpu": [
         {
           "provider": "kvm",
           "stats": [
             { "name": "guest_mode",
               "unit": "none",
               "base": 10,
               "exponent": 0 },
             { "name": "directed_yield_successful"
               "unit": "none",
               "base": 10,
               "exponent": 0 },
             },
           ],
         },
         {
           "provider": "qemu"
           "stats": [
             {
               "name": "something_blah_blah",
               "unit": "bytes",
               "base": 2,
               "exponent": 20,
             },
          ]
        },
    ],
    "vm": [
        {
          "provider": "kvm",
          "stats": [
            { "name": "max_mmu_page_hash_collisions", ... }
            { "name": "max_mmu_rmap_size", ... }
          ]
        },
        {
          "provider": "qemu",
          "stats": [
            { "name": "blah", ... }
          ]
        },
    }
 }

> 
> { "execute": "query-stats-instances" }
> { "return": [
>     { "name": "vcpu_1",
>       "type": "kvm-vcpu" },
>     { "name": "vcpu_0",
>       "type": "kvm-vcpu" },
>     { "name": "vm",
>       "type": "kvm-vm" } ]
> }

I don't see a need for this command at all. It doesn't tell
apps anything they can't already learn from "query-stats-schemas"

New 'type' values will involve QEMU code changes no matter what.
So in the even we need something other than 'vcpu' and 'vm' stats
reported by 'query-stats', apps can just query the QAPI schema
in the normal manner to learn about struct field names that exist
in this QEMU version.

IOW, this really just re-invented QAPI introspection for no
benefit IMHO.

> diff --git a/qapi/misc.json b/qapi/misc.json
> index 358548abe1..a0a07ef0b1 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -527,3 +527,145 @@
>   'data': { '*option': 'str' },
>   'returns': ['CommandLineOptionInfo'],
>   'allow-preconfig': true }
> +
> +##
> +# @StatType:
> +#
> +# Enumeration of stat types
> +# @cumulative: stat is cumulative; value can only increase.
> +# @instant: stat is instantaneous; value can increase or decrease.
> +# @peak: stat is the peak value; value can only increase.
> +#
> +# Since: 7.0
> +##
> +{ 'enum' : 'StatType',
> +  'data' : [ 'cumulative', 'instant', 'peak' ] }
> +
> +##
> +# @StatUnit:
> +#
> +# Enumeration of stat units
> +# @bytes: stat reported in bytes.
> +# @seconds: stat reported in seconds.
> +# @cycles: stat reported in clock cycles.
> +# @none: no unit for this stat.
> +#
> +# Since: 7.0
> +##
> +{ 'enum' : 'StatUnit',
> +  'data' : [ 'bytes', 'seconds', 'cycles', 'none' ] }
> +
> +##
> +# @StatData:
> +#
> +# Individual stat
> +# @name: Stat name
> +# @type: @StatType
> +# @unit: @StatUnit
> +# @base: Exponent base (2 or 10)
> +# @exponent: Used together with @base
> +# @val: List of uint64 values
> +#
> +# Since: 7.0
> +##
> +{ 'struct': 'StatData',
> +  'data': { 'name': 'str',
> +            'type': 'StatType',
> +            'unit': 'StatUnit',
> +            'base': 'uint8',
> +            'exponent': 'int16',
> +            'val': [ 'uint64' ] } }
> +
> +##
> +# @Stats:
> +#
> +# Stats per resource (e.g. vm or vcpu)
> +# @name: Resource name
> +# @stats: List of @StatData
> +#
> +# Since: 7.0
> +##
> +{ 'struct': 'Stats',
> +  'data': {'name': 'str',
> +           'type': 'StatSchemaType',
> +           'stats': [ 'StatData' ] } }
> +
> +##
> +# @query-stats:
> +#
> +# @name: Stat name (optional)
> +# @type: Type name (optional)
> +# Returns: List of @Stats
> +#
> +# Since: 7.0
> +##
> +{ 'command': 'query-stats',
> +  'data': { '*name': 'str', '*type': 'str' },
> +  'returns': [ 'Stats' ] }

The 'name' and 'type' are used for filtering I presume. Only allowing
a single value for each feels pretty inflexible. I'd say we want to
allow mutliple requests at a time for efficiency.

Bearing in mind my other suggestions above, I'd think we should have
something  more like

 { 'enum': 'StatsProvider',
   'data': ["kvm", "qemu", "tcg", ....],
 }

 { 'struct': 'StatsRequest',
   'data': {
      'provider': 'StatsProvider',
      // If omitted, report everything for this provider
      '*fields': [ 'str' ]
   }
 }

 { 'struct': 'StatsVCPURequest',
   'base': 'StatsRequest',
   'data': {
     // To request subset of vCPUs e.g.
     //  "cpu_set": "0-3"
     // Could use ['int'] instead if desired
     '*cpu_set': str,
   },
 }

 { 'struct': 'StatsFilter',
   'data': {
     // If omitted means don't report that group of data
     '*vcpu': 'StatsVCPURequest',
     '*vm': 'StatsRequest',
   },
 }

 { 'alternate': 'StatsValue',
   'data': { 'scalar': 'int64',
             'list': [ 'int64 ' ] }
 }

 { 'struct': 'StatsResultsEntry',
   'data': {
     'provider': 'StatsProvider',
     'stats': [ 'StatsValue' ]
   }
 }

 { 'struct': 'StatsResults':
   'data': {
     '*vcpu': [ [ 'StatsResultsEntry' ] ],
     '*vm': [ 'StatsResultsEntry' ]
   }
 }

 { 'command': 'query-stats',
   'data': { 'filter': '*StatsFilter' },
   'returns': 'StatsResults' }


> +
> +##
> +# @StatSchemaType:
> +#
> +# Enumeration of stats schema types
> +#
> +# Since: 7.0
> +##
> +{ 'enum' : 'StatSchemaType',
> +  'data' : [ ] }
> +
> +##
> +# @StatSchemaEntry:
> +#
> +# Individual stat in a schema type
> +#
> +# Since: 7.0
> +##
> +{ 'struct': 'StatSchemaEntry',
> +  'data': { 'name': 'str' } }
> +
> +##
> +# @StatsSchema:
> +#
> +# Stats per @StatSchemaType
> +# @type: @StatSchemaType
> +# @stats: @StatCchemaName
> +#
> +# Since: 7.0
> +##
> +{ 'struct': 'StatsSchema',
> +  'data': { 'type': 'StatSchemaType',
> +            'stats': [ 'StatSchemaEntry' ] } }
> +
> +##
> +# @query-stats-schemas:
> +#
> +# @type: type name (optional)
> +# Returns: List of @StatsSchema
> +#
> +# Since: 7.0
> +##
> +{ 'command': 'query-stats-schemas',
> +  'data': { '*type': 'str' },
> +  'returns': [ 'StatsSchema' ] }

I'd think this is more like

 { 'struct': 'StatsSchemaValue',
   'data': {
     'name': 'str',
     'type': 'StatType',
     'unit': 'StatUnit',
     'base': 'uint8',
     'exponent': 'int16',
   },
 }

 { 'struct': 'StatsSchemaProvider',
   'data': {
     'provider': 'StatsProvider',
     'stats': [ 'StatsSchemaValue'],
   }
 }

 { 'struct': 'StatsSchemaResult',
   'data': {
     'vcpu': ['StatsSchemaProvider'],
     'vm': ['StatsSchemaProvider'],
   }
 }

 { 'command': 'query-stats-schemas',
   'returns': [ 'StatsSchemaResult' ] }


> +
> +##
> +# @StatsInstance:
> +#
> +# @name: resource name
> +# @type: @StatSchemaType
> +#
> +# Since: 7.0
> +##
> +{ 'struct': 'StatsInstance',
> +  'data': { 'name': 'str',
> +            'type': 'StatSchemaType' } }
> +
> +##
> +# @query-stats-instances:
> +#
> +# Returns list of @StatsInstance
> +#
> +# Since: 7.0
> +##
> +{ 'command': 'query-stats-instances',
> +  'returns': [ 'StatsInstance' ] }

As mentioned earlier, IMHO this doesn't need to exist.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [PATCH v2 1/3] qmp: Support for querying stats, Daniel P . Berrangé <=
Prev by Date: Re: [PATCH 0/3] iotests: multiprocessing!!
Next by Date: Re: [PULL for-6.2 0/1] target-arm queue
Previous by thread: [PULL for-6.2 0/1] target-arm queue
Next by thread: dozens of qemu/kvm VMs getting into stuck states since kernel ~5.13
Index(es):
- Date
- Thread