qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v12 27/36] qapi: Forbid case-insensitive clashes


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v12 27/36] qapi: Forbid case-insensitive clashes
Date: Wed, 18 Nov 2015 18:08:31 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Eric Blake <address@hidden> writes:

> In general, designing user interfaces that rely on case
> distinction is poor practice.  Another benefit of enforcing
> a restriction against case-insensitive clashes is that we
> no longer have to worry about the situation of enum values
> that could be distinguished by case if mapped by c_name(),
> but which cannot be distinguished when mapped to C as
> ALL_CAPS by c_enum_const() [via c_name(name, False).upper()].
> Thus, having the generator look for case collisions up front
> will prevent developers from worrying whether different
> munging rules for member names compared to enum values as a
> discriminator will cause any problems in qapi unions.
>
> We do not have to care about c_name(name, True) vs.
> c_name(name, False), as long as any pair of munged names being
> compared were munged using the same flag value to c_name().
> This is because commit 9fb081e already reserved names munging
> to 'q_', and this patch extends the reservation to 'Q_'; and
> because recent patches fixed things to ensure the only thing
> done by the flag is adding the prefix 'q_', that the only use
> of '.' (which also munges to '_') is in downstream extension
> prefixes, and that enum munging does not add underscores to
> any CamelCase values.
>
> However, we DO have to care about the fact that we have a
> command 'stop' and an event 'STOP' (currently the only case
> collision of any of our .json entities).  To solve that, use
> some tricks in the lookup map for entities.  With some careful
> choice of keys, we can bisect the map into two collision pools
> (name.upper() for events, name.lower() for all else).  Since
> we already document that no two entities can have the exact
> same spelling, an entity can only occur under one of the two
> partitions of the map.  We could go further and enforce that
> events are named with 'ALL_CAPS' and that nothing else is
> named in that manner; but that can be done as a followup if
> desired.  We could also separate commands from type names,
> but then we no longer have a convenient upper vs. lower
> partitioning allowing us to share a single dictionary.
>
> In order to keep things stable, the visit() method has to
> ensure that it visits entities sorted by their real name, and
> not by the new munged keys of the dictionary; Python's
> attrgetter is a lifesaver for this task.
>
> There is also the possibility that we may want to add a future
> extension to QMP of teaching it to be case-insensitive (the
> user could request command 'Quit' instead of 'quit', or could
> spell a struct field as 'CPU' instead of 'cpu').  But for that
> to be a practical extension, we cannot break backwards
> compatibility with any existing struct that was already
> relying on case sensitivity.  Fortunately, after a recent
> patch cleaned up CpuInfo, there are no such existing qapi
> structs.  Of course, the idea of a future extension is not
> as strong of a reason to make the change.
>
> At any rate, it is easier to be strict now, and relax things
> later if we find a reason to need case-sensitive QMP members,
> than it would be to wish we had the restriction in place.
>
> Add new tests args-case-clash.json and command-type-case-clash.json
> to demonstrate that the collision detection works.
>
> Signed-off-by: Eric Blake <address@hidden>
>
> ---
> v12: add in entity case collisions (required sharing two maps),
> improve commit message
> v11: rebase to latest, don't focus so hard on future case-insensitive
> extensions, adjust commit message
> v10: new patch
> ---
>  docs/qapi-code-gen.txt                         |  7 +++++
>  scripts/qapi.py                                | 41 
> +++++++++++++++++++++-----
>  tests/Makefile                                 |  2 ++
>  tests/qapi-schema/args-case-clash.err          |  1 +
>  tests/qapi-schema/args-case-clash.exit         |  1 +
>  tests/qapi-schema/args-case-clash.json         |  4 +++
>  tests/qapi-schema/args-case-clash.out          |  0
>  tests/qapi-schema/command-type-case-clash.err  |  1 +
>  tests/qapi-schema/command-type-case-clash.exit |  1 +
>  tests/qapi-schema/command-type-case-clash.json |  3 ++
>  tests/qapi-schema/command-type-case-clash.out  |  0
>  11 files changed, 53 insertions(+), 8 deletions(-)
>  create mode 100644 tests/qapi-schema/args-case-clash.err
>  create mode 100644 tests/qapi-schema/args-case-clash.exit
>  create mode 100644 tests/qapi-schema/args-case-clash.json
>  create mode 100644 tests/qapi-schema/args-case-clash.out
>  create mode 100644 tests/qapi-schema/command-type-case-clash.err
>  create mode 100644 tests/qapi-schema/command-type-case-clash.exit
>  create mode 100644 tests/qapi-schema/command-type-case-clash.json
>  create mode 100644 tests/qapi-schema/command-type-case-clash.out
>
> diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
> index b01a691..1f6cb16 100644
> --- a/docs/qapi-code-gen.txt
> +++ b/docs/qapi-code-gen.txt
> @@ -102,6 +102,13 @@ single-dimension array of that type; multi-dimension 
> arrays are not
>  directly supported (although an array of a complex struct that
>  contains an array member is possible).
>
> +Client JSON Protocol is case-sensitive.  However, the generator
> +rejects attempts to create object members that differ only in case
> +after normalization (thus 'a-b' and 'A_B' collide); and likewise
> +rejects attempts to create commands or types that differ only in case,
> +or events that differ only in case (it is possible to have a command
> +and event map to the same case-insensitive name, though).
> +
>  Types, commands, and events share a common namespace.  Therefore,
>  generally speaking, type definitions should always use CamelCase for
>  user-defined type names, while built-in types are lowercase. Type
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index cde15f2..e41dbaf 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -13,6 +13,7 @@
>
>  import re
>  from ordereddict import OrderedDict
> +from operator import attrgetter
>  import errno
>  import getopt
>  import os
> @@ -378,9 +379,9 @@ def check_name(expr_info, source, name, 
> allow_optional=False,
>      # code always prefixes it with the enum name
>      if enum_member and membername[0].isdigit():
>          membername = 'D' + membername
> -    # Reserve the entire 'q_' namespace for c_name()
> +    # Reserve the entire 'q_'/'Q_' namespace for c_name()
>      if not valid_name.match(membername) or \
> -       c_name(membername, False).startswith('q_'):
> +       c_name(membername, False).upper().startswith('Q_'):
>          raise QAPIExprError(expr_info,
>                              "%s uses invalid name '%s'" % (source, name))
>

We use .upper() to get a case-insensitive (in the C locale!) string
compare.  That's fine.  .lower() would be just as fine.  But see [*]
below.

> @@ -1040,7 +1041,7 @@ class QAPISchemaObjectTypeMember(object):
>          assert self.type
>
>      def check_clash(self, info, seen):
> -        cname = c_name(self.name)
> +        cname = c_name(self.name).upper()
>          if cname in seen:
>              raise QAPIExprError(info,
>                                  "%s collides with %s"
> @@ -1085,7 +1086,7 @@ class QAPISchemaObjectTypeVariants(object):
>
>      def check(self, schema, seen):
>          if not self.tag_member:    # flat union
> -            self.tag_member = seen[c_name(self.tag_name)]
> +            self.tag_member = seen[c_name(self.tag_name).upper()]
>              assert self.tag_name == self.tag_member.name
>          assert isinstance(self.tag_member.type, QAPISchemaEnumType)
>          for v in self.variants:
> @@ -1189,6 +1190,8 @@ class QAPISchema(object):
>      def __init__(self, fname):
>          try:
>              self.exprs = check_exprs(QAPISchemaParser(open(fname, 
> "r")).exprs)
> +            # _entity_dict holds two namespaces: events are stored via
> +            # name.upper(), commands and types are stored via name.lower().
>              self._entity_dict = {}
>              self._predefining = True
>              self._def_predefineds()
> @@ -1202,11 +1205,32 @@ class QAPISchema(object):
>      def _def_entity(self, ent):
>          # Only the predefined types are allowed to not have info
>          assert ent.info or self._predefining
> -        assert ent.name not in self._entity_dict
> -        self._entity_dict[ent.name] = ent
> +        # On insertion, we need to check for an exact match in either
> +        # namespace, then for case collision in a single namespace
> +        if self.lookup_entity(ent.name):
> +            raise QAPIExprError(ent.info,
> +                                "Entity '%s' already defined" % end.name)
> +        cname = c_name(ent.name)
> +        if isinstance(ent, QAPISchemaEvent):
> +            cname = cname.upper()
> +        else:
> +            cname = cname.lower()
> +        if cname in self._entity_dict:
> +            raise QAPIExprError(ent.info,
> +                                "Entity '%s' collides with '%s'"
> +                                % (ent.name, self._entity_dict[cname].name))
> +        self._entity_dict[cname] = ent
>
>      def lookup_entity(self, name, typ=None):
> -        ent = self._entity_dict.get(name)
> +        # No thanks to 'stop'/'STOP', we have to check two namespaces during
> +        # lookups, but only return a result on exact match.
> +        ent1 = self._entity_dict.get(c_name(name).lower())   # commands, 
> types
> +        ent2 = self._entity_dict.get(c_name(name).upper())   # events
> +        ent = None
> +        if ent1 and ent1.name == name:
> +            ent = ent1
> +        elif ent2 and ent2.name == name:
> +            ent = ent2
>          if typ and not isinstance(ent, typ):
>              return None
>          return ent

Let's see how this works.

Before the patch, _entity_dict's key serves one purpose: mapping names
to entities the obvious way: use entity name as dictionary key.

    lookup_entity():
        ent = self._entity_dict.get(name)
        # not ent or ent.name == name

    _def_entity():
        assert ent.name not in self._entity_dict
        self._entity_dict[ent.name] = ent

Your patch presses _entity_dict into clash detection service.  I'm going
to split the change into pieces I can understand.

First step changes the key from name to c_name(name) to ignore [-._]
differences (the ticklish identifier protection has no effect).  The
dictionary lookup changes from "the entity with this name" to "the
entity whose name clashes with this name".  To preserve semantics, the
two functions need to compensate.

    lookup_entity():
        ent = self._entity_dict.get(c_name(name))
        if ent and ent.name != name
            ent = None
        # not ent or ent.name == name

    _def_entity():
        cname = c_name(ent.name)
        assert cname not in self._entity_dict
        self._entity_dict[cname] = ent

A bit less obvious, but not too bad.

We can then turn _def_entity()'s assertion into error reporting.  It
splits into two separate errors, "already defined" and "collides".

    _def_entity():
        if self.lookup_entity(ent.name):
            raise QAPIExprError(ent.info,
                                "Entity '%s' already defined" % end.name)
        cname = c_name(ent.name)
        if cname in self._entity_dict:
            raise QAPIExprError(ent.info,
                                "Entity '%s' collides with '%s'"
                                % (ent.name, self._entity_dict[cname].name))
        self._entity_dict[cname] = ent

To catch case-insensitive clashes, you change the key further to
.lower().  No additional compensation necessary.

However, this makes command 'stop' clash with event 'STOP'.  You
cleverly sidestep that by partitioning _entity_dict's key space into
upper case for events and lower case for everything else.  Works,
because every entity name contains at least one letter.

_def_entity()'s cname = c_name(ent.name).lower() becomes

        cname = c_name(ent.name)
        if isinstance(ent, QAPISchemaEvent):
            cname = cname.upper()
        else:
            cname = cname.lower()

Digression [*]: I think using .lower() in check_name() would be slightly
cleaner, because it matches the case mapping we use here.

lookup_entity() is even worse off.  There, we don't know whether we're
looking for an event or not, so we try both.  If the entity is defined,
exactly one will succeed, else none.  Resulting code:

        ent1 = self._entity_dict.get(c_name(name).lower())   # commands, types
        ent2 = self._entity_dict.get(c_name(name).upper())   # events
        ent = None
        if ent1 and ent1.name == name:
            ent = ent1
        elif ent2 and ent2.name == name:
            ent = ent2

All right, this should work.

Now let me try claw back some clarity.

First try: refine your approach.

Observe that all but one caller of lookup_entity() actually look up
types.  The one caller that doesn't is _def_entity().  Change it to:

        cname = c_name(ent.name)
        if isinstance(ent, QAPISchemaEvent):
            cname = cname.upper()
        else:
            cname = cname.lower()
        if self._lookup_entity(cname)
            raise QAPIExprError(ent.info,
                                "Entity '%s' already defined" % end.name)
        if cname in self._entity_dict:
            raise QAPIExprError(ent.info,
                                "Entity '%s' collides with '%s'"
                                % (ent.name, self._entity_dict[cname].name))

where _lookup_entity() is back to the much simpler version:

        ent = self._entity_dict.get(cname)
        if ent and ent.name != cname
            ent = None

lookup_type() becomes:

    def lookup_type(self, name, typ=QAPISchemaType):
        return self._lookup_entity(name, typ)

and the third caller _make_implicit_object_type() goes from

        if not self.lookup_entity(name, QAPISchemaObjectType):

to

        if not self.lookup_type(name, QAPISchemaObjectType):

Another possible improvement is hiding the case-folding logic in
methods.  Have a QAPISchemaEntity.c_namecase() that returns
self.c_name().lower().  Overwrite it in QAPISchemaEvent to return
.upper() instead.  Use it to make _def_entity() less ugly.

Probably not worthwhile unless more uses of .c_namecase() pop up.

Second approach: don't use _entity_dict for clash detection!  Have a
second dictionary.

lookup_entity() remains untouched.  _def_entity() becomes something
like:

        if ent.name in self._entity_dict:
            raise QAPIExprError(ent.info,
                                "Entity '%s' already defined" % end.name)
        if ent.c_namecase() in self._clash_dict:
            raise QAPIExprError(ent.info,
                                "Entity '%s' collides with '%s'"
                                % (ent.name, self._entity_dict[cname].name))
        self._entity_dict[ent.name] = ent
        self._clash_dict[cname] = ent

or the equivalent code without c_namecase().

> @@ -1390,7 +1414,8 @@ class QAPISchema(object):
>
>      def visit(self, visitor):
>          visitor.visit_begin(self)
> -        for (name, entity) in sorted(self._entity_dict.items()):
> +        for entity in sorted(self._entity_dict.values(),
> +                             key=attrgetter('name')):
>              if visitor.visit_needed(entity):
>                  entity.visit(visitor)
>          visitor.visit_end()

Cool trick.

The order of items in dict.items() may vary:

    Keys and values are listed in an arbitrary order which is
    non-random, varies across Python implementations, and depends on the
    dictionary’s history of insertions and deletions.

https://docs.python.org/2.7/library/stdtypes.html#dict.items

By sorting, we go from "arbitrary order that can vary" to "arbitrary
fixed order".  That's our only reason for sorting.

Your trick preserves the arbitrary order we use before the patch.  Good
idea, because it makes verifying the patch doesn't mess up output
easier.

We could revert the trick in a followup patch, however.

> diff --git a/tests/Makefile b/tests/Makefile
> index a8a3b12..762b0ca 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -242,6 +242,7 @@ qapi-schema += args-alternate.json
>  qapi-schema += args-any.json
>  qapi-schema += args-array-empty.json
>  qapi-schema += args-array-unknown.json
> +qapi-schema += args-case-clash.json
>  qapi-schema += args-int.json
>  qapi-schema += args-invalid.json
>  qapi-schema += args-member-array-bad.json
> @@ -256,6 +257,7 @@ qapi-schema += bad-type-bool.json
>  qapi-schema += bad-type-dict.json
>  qapi-schema += bad-type-int.json
>  qapi-schema += command-int.json
> +qapi-schema += command-type-case-clash.json
>  qapi-schema += comments.json
>  qapi-schema += double-data.json
>  qapi-schema += double-type.json
> diff --git a/tests/qapi-schema/args-case-clash.err 
> b/tests/qapi-schema/args-case-clash.err
> new file mode 100644
> index 0000000..0fafe75
> --- /dev/null
> +++ b/tests/qapi-schema/args-case-clash.err
> @@ -0,0 +1 @@
> +tests/qapi-schema/args-case-clash.json:4: 'A' (parameter of oops) collides 
> with 'a' (parameter of oops)
> diff --git a/tests/qapi-schema/args-case-clash.exit 
> b/tests/qapi-schema/args-case-clash.exit
> new file mode 100644
> index 0000000..d00491f
> --- /dev/null
> +++ b/tests/qapi-schema/args-case-clash.exit
> @@ -0,0 +1 @@
> +1
> diff --git a/tests/qapi-schema/args-case-clash.json 
> b/tests/qapi-schema/args-case-clash.json
> new file mode 100644
> index 0000000..e6f0625
> --- /dev/null
> +++ b/tests/qapi-schema/args-case-clash.json
> @@ -0,0 +1,4 @@
> +# C member name collision
> +# Reject members that clash case-insensitively, even though our mapping to
> +# C names preserves case.
> +{ 'command': 'oops', 'data': { 'a': 'str', 'A': 'int' } }
> diff --git a/tests/qapi-schema/args-case-clash.out 
> b/tests/qapi-schema/args-case-clash.out
> new file mode 100644
> index 0000000..e69de29
> diff --git a/tests/qapi-schema/command-type-case-clash.err 
> b/tests/qapi-schema/command-type-case-clash.err
> new file mode 100644
> index 0000000..b1cc697
> --- /dev/null
> +++ b/tests/qapi-schema/command-type-case-clash.err
> @@ -0,0 +1 @@
> +tests/qapi-schema/command-type-case-clash.json:3: Entity 'foo' collides with 
> 'Foo'

The message's location is foo's.  An additional message "'Foo' defined
here" would be nice.  Just an idea, could be done later.

> diff --git a/tests/qapi-schema/command-type-case-clash.exit 
> b/tests/qapi-schema/command-type-case-clash.exit
> new file mode 100644
> index 0000000..d00491f
> --- /dev/null
> +++ b/tests/qapi-schema/command-type-case-clash.exit
> @@ -0,0 +1 @@
> +1
> diff --git a/tests/qapi-schema/command-type-case-clash.json 
> b/tests/qapi-schema/command-type-case-clash.json
> new file mode 100644
> index 0000000..1f6adee
> --- /dev/null
> +++ b/tests/qapi-schema/command-type-case-clash.json
> @@ -0,0 +1,3 @@
> +# we reject commands that would differ only case from a type
> +{ 'struct': 'Foo', 'data': { 'i': 'int' } }
> +{ 'command': 'foo', 'data': { 'character': 'str' } }
> diff --git a/tests/qapi-schema/command-type-case-clash.out 
> b/tests/qapi-schema/command-type-case-clash.out
> new file mode 100644
> index 0000000..e69de29



reply via email to

[Prev in Thread] Current Thread [Next in Thread]