qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 09/20] docs/sphinx: Add new qapi-doc Sphinx extension


From: Markus Armbruster
Subject: Re: [PATCH v5 09/20] docs/sphinx: Add new qapi-doc Sphinx extension
Date: Tue, 22 Sep 2020 13:42:50 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Peter Maydell <peter.maydell@linaro.org> writes:

> On Fri, 4 Sep 2020 at 13:29, Markus Armbruster <armbru@redhat.com> wrote:
>>
>> Peter Maydell <peter.maydell@linaro.org> writes:
>>
>> > Some of our documentation is auto-generated from documentation
>> > comments in the JSON schema.
>> >
>> > For Sphinx, rather than creating a file to include, the most natural
>> > way to handle this is to have a small custom Sphinx extension which
>> > processes the JSON file and inserts documentation into the rST
>> > file being processed.
>> >
>> > This is the same approach that kerneldoc and hxtool use.
>> >
>> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>
>> Got a pointer to "Writing Sphinx Extensions for Dummies" or similar?
>
> The upstream documentation on extending Sphinx is probably
> a good place to start; in particular the tutorials are useful:
>
> https://www.sphinx-doc.org/en/master/development/index.html

Do you think having this link in a comment could help future readers /
maintainers?

> I have also found that a fair amount of trial-and-error,
> examination of the source of Sphinx itself or of other
> extensions, and occasionally asking questions on the Sphinx
> mailing list has been necessary...
>
>> > --- /dev/null
>> > +++ b/docs/sphinx/qapidoc.py
>> > @@ -0,0 +1,504 @@
>> > +# coding=utf-8
>> > +#
>> > +# QEMU qapidoc QAPI file parsing extension
>> > +#
>> > +# Copyright (c) 2020 Linaro
>> > +#
>> > +# This work is licensed under the terms of the GNU GPLv2 or later.
>> > +# See the COPYING file in the top-level directory.
>> > +"""qapidoc is a Sphinx extension that implements the qapi-doc directive"""
>> > +
>> > +# The purpose of this extension is to read the documentation comments
>> > +# in QAPI JSON schema files, and insert them all into the current 
>> > document.
>>
>> Let's drop "JSON", it's not really true.
>
> Fixed.
>
>> > +# The conf.py file must set the qapidoc_srctree config value to
>> > +# the root of the QEMU source tree.
>> > +# Each qapi-doc:: directive takes one argument which is the
>> > +# path of the .json file to process, relative to the source tree.
>>
>> Beg your pardon?  Oh, you're talking about .rst files now.  The next two
>> patches will add such files.  Perhaps something like "The qapi-doc::
>> reST directive provided by this extension takes ..."
>
> OK, how about:
>
> # The purpose of this extension is to read the documentation comments
> # in QAPI schema files, and insert them all into the current document.
> #
> # It implements one new rST directive, "qapi-doc::".
> # Each qapi-doc:: directive takes one argument, which is the
> # path of the .json file to process, relative to the source tree.
> #
> # The docs/conf.py file must set the qapidoc_srctree config value to
> # the root of the QEMU source tree.
>
> ?

Works for me with "pathname" instead of "path".  I'd also write "schema
file" instead of ".json file", but that's a matter of taste and up to
you.

>> > +# Function borrowed from pydash, which is under the MIT license
>> > +def intersperse(iterable, separator):
>> > +    """Like join, but for arbitrary iterables, notably arrays"""
>> > +    iterable = iter(iterable)
>> > +    yield next(iterable)
>> > +    for item in iterable:
>> > +        yield separator
>> > +        yield item
>>
>> What's wrong with separator.join(iterable)?
>
> It doesn't work on all the things we'd like to intersperse().
> If you try defining intersperse as
>
> def intersperse(iterable, separator):
>     separator.join(iterable)
>
> then you get an exception:
>
>   File 
> "/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../scripts/qapi/schema.py",
> line 445, in visit
>     self.base, self.local_members, self.variants)
>   File "/home/petmay01/linaro/qemu-from-laptop/qemu/docs/sphinx/qapidoc.py",
> line 314, in visit_object_type
>     [self._nodes_for_members(doc, 'Members', base, variants),
>   File "/home/petmay01/linaro/qemu-from-laptop/qemu/docs/sphinx/qapidoc.py",
> line 173, in _nodes_for_members
>     nodes.Text(', ')))
>   File "/home/petmay01/linaro/qemu-from-laptop/qemu/docs/sphinx/qapidoc.py",
> line 53, in intersperse
>     separator.join(iterable)
> TypeError: sequence item 0: expected str instance, literal found

Aha.

@iterable appears to be a list of docutils.node.literal, whatever that
may be.

The doc string is confusing (I figure you borrowed the it along with the
code).  str.join() works just fine for arrays.  It's not the kind of
iterable that's the problem, it's the kind of thing the iterable yields:
str.join() wants str, we have. docutils.node.literal.

The lazy solution is to delete the confusing doc string.

Here's my attempt at a non-lazy solution:

    """Yield the members of *iterable* interspersed with *separator*."""

>> > +
>> > +class QAPISchemaGenRSTVisitor(QAPISchemaVisitor):
>> > +    """A QAPI schema visitor which generates docutils/Sphinx nodes
>> > +
>> > +    This class builds up a tree of docutils/Sphinx nodes corresponding
>> > +    to documentation for the various QAPI objects. To use it, first create
>> > +    a QAPISchemaGenRSTVisitor object, and call its visit_begin() method.
>> > +    Then you can call one of the two methods 'freeform' (to add 
>> > documentation
>> > +    for a freeform documentation chunk) or 'symbol' (to add documentation
>> > +    for a QAPI symbol). These will cause the visitor to build up the
>> > +    tree of document nodes. Once you've added all the documentation
>> > +    via 'freeform' and 'symbol' method calls, you can call 
>> > 'get_document_nodes'
>> > +    to get the final list of document nodes (in a form suitable for 
>> > returning
>> > +    from a Sphinx directive's 'run' method).
>> > +    """
>> > +    def __init__(self, sphinx_directive):
>> > +        self._cur_doc = None
>> > +        self._sphinx_directive = sphinx_directive
>> > +        self._top_node = nodes.section()
>> > +        self._active_headings = [self._top_node]
>> > +
>> > +    def _serror(self, msg):
>> > +        """Raise an exception giving a user-friendly syntax error 
>> > message"""
>> > +        file = self._cur_doc.info.fname
>> > +        line = self._cur_doc.info.line
>> > +        raise ExtensionError('%s line %d: syntax error: %s' % (file, 
>> > line, msg))
>>
>> Note for later: ornate error message format, no use of QAPISourceInfo.
>>
>> > +
>
>> > +
>> > +    def _nodes_for_enum_values(self, doc, what):
>>
>> @what is only ever 'Values', isn't it?
>
> Yes. I think this was a legacy from conversion from scripts/qapi/doc.py,
> which does pass in a 'Values' string, but it does that because it shares
> code in texi_members() between enums and other things. In this code
> I chose to split that out into different functions rather than using
> one function with a member_func argument because the member vs enum
> code is different enough that it's clearer that way. But I didn't
> notice that there wasn't any longer much point in passing a 'what'
> argument.
>
> Fixed to hard-code 'Values' rather than passing it as an argument.
>
>> > +    def _nodes_for_sections(self, doc, ifcond):
>> > +        """Return doctree nodes for additional sections following 
>> > arguments"""
>> > +        nodelist = []
>> > +        for section in doc.sections:
>> > +            snode = self._make_section(section.name)
>> > +            if section.name and section.name.startswith('Example'):
>> > +                snode += self._nodes_for_example(section.text)
>> > +            else:
>> > +                self._parse_text_into_node(section.text, snode)
>> > +            nodelist.append(snode)
>> > +        if ifcond:
>> > +            snode = self._make_section('If')
>> > +            snode += self._nodes_for_ifcond(ifcond, with_if=False)
>> > +            nodelist.append(snode)
>>
>> I think I'd rather have a separate _node_for_ifcond().  Not a demand.
>
> I'm not sure what you have in mind here, so I'll leave it as it is :-)

_nodes_for_sections() is used like this:

        self._add_doc(WHAT,
                      [self._nodes_for_WHAT(doc, ...),
                       self._nodes_for_features(doc),
                       self._nodes_for_sections(doc, ifcond)])

._add_doc()'s second argument is a list of nodes for sections.

._nodes_for_WHAT() computes the nodes specific to the kind of thing
we're processing: enum, object, alternate, command, event.

._nodes_for_features() computes the nodes for the feature section.

._nodes_for_sections() computes both the nodes for additional sections
and the nodes for the ifcond section.

I'd compute the nodes for the ifcond section in their own function, to
keep .nodes_for_sections() focused on just one purpose.

>> > +        if not nodelist:
>> > +            return None
>>
>> Any particular reason for mapping [] to None?
>
> I forget at this point. Probably related to the next query...
>
>> > +        return nodelist
>> > +
>> > +    def _add_doc(self, typ, sections):
>> > +        """Add documentation for a command/object/enum...
>> > +
>> > +        We assume we're documenting the thing defined in self._cur_doc.
>> > +        typ is the type of thing being added ("Command", "Object", etc)
>> > +
>> > +        sections is a list of nodes for sections to add to the definition.
>> > +        """
>> > +
>> > +        doc = self._cur_doc
>> > +        snode = nodes.section(ids=[self._sphinx_directive.new_serialno()])
>> > +        snode += nodes.title('', '', *[nodes.literal(doc.symbol, 
>> > doc.symbol),
>> > +                                       nodes.Text(' (' + typ + ')')])
>> > +        self._parse_text_into_node(doc.body.text, snode)
>> > +        for s in sections:
>> > +            if s is not None:
>> > +                snode += s
>>
>> Looks like you're flattening the two-level list the callers create,
>> e.g. ...
>>
>> > +        self._add_node_to_current_heading(snode)
>> > +
>> > +    def visit_enum_type(self, name, info, ifcond, features, members, 
>> > prefix):
>> > +        doc = self._cur_doc
>> > +        self._add_doc('Enum',
>> > +                      [self._nodes_for_enum_values(doc, 'Values'),
>> > +                       self._nodes_for_features(doc),
>> > +                       self._nodes_for_sections(doc, ifcond)])
>>
>> ... this one: [[nodes for enum values...]
>>                [nodes for features...]
>>                [nodes for sections]]
>>
>> Makes me wonder why you build two levels in the first place.
>
> This is probably just me not being a very experienced Python
> programmer. What's the syntax for passing _add_doc the single list
> which is just the concatenation of the various lists that
> the _nodes_for_* methods return ?

You could replace

                      [self._nodes_for_WHAT(doc, ...),
                       self._nodes_for_features(doc),
                       self._nodes_for_sections(doc, ifcond)])

by

                      self._nodes_for_WHAT(doc, ...)
                      + self._nodes_for_features(doc)
                      + self._nodes_for_sections(doc, ifcond)

Operator + concatenates sequences.

>> > +
>> > +        if re.match(r'=+ ', doc.body.text):
>> > +            # Section or subsection heading: must be the only thing in 
>> > the block
>> > +            (heading, _, rest) = doc.body.text.partition('\n')
>> > +            if rest != '':
>> > +                raise ExtensionError('%s line %s: section or subsection 
>> > heading'
>> > +                                     ' must be in its own doc comment 
>> > block'
>> > +                                     % (doc.info.fname, doc.info.line))
>>
>> Same ornate error message format as above, less 'syntax error: '.
>
> Yes. I realized after sending v5 that this could be made to
> call _serror() instead of directly raising the ExtensionError.
>
>> > +        try:
>> > +            schema = QAPISchema(qapifile)
>> > +        except QAPIError as err:
>> > +            # Launder QAPI parse errors into Sphinx extension errors
>> > +            # so they are displayed nicely to the user
>> > +            raise ExtensionError(str(err))
>>
>> I expected plumbing the error location through Sphinx to the user to
>> take a bit more effort.  I'm not complaining.
>>
>> A QAPIError looks like this when converted to str:
>>
>>     In file included from ...:
>>     ...
>>     In file included from ...:
>>     FILE: In ENTITY-TYPE 'NAME':
>>     FILE:LINE: MESSAGE
>>
>> "In file" lines appear when the error is in a sub-module.
>>
>> An "In ENTITY-TYPE" line appears when the error is in an entity
>> definition.
>>
>> The other two ExtensionError() look like
>>
>>     FILE line LINE: syntax error: MESSAGE
>>
>> and
>>
>>     FILE line LINE: MESSAGE
>>
>> More ornate, less information.
>>
>> I figure more errors can get thrown by nested_parse_with_titles() (see
>> below).  How do they look like?
>
> They look like this:
>
> Warning, treated as error:
> /home/petmay01/linaro/qemu-from-laptop/qemu/docs/../qapi/block.json:15:Bullet
> list ends without a blank line; unexpected unindent.
>
> (I've just noticed that with Sphinx 1.6, which we still have
> to support, the file/line info isn't passed through, so you get:
>
> Warning, treated as error:
> /home/petmay01/linaro/qemu-from-laptop/qemu/docs/interop/qemu-qmp-ref.rst:7:Bullet
> list ends without a blank line; unexpected unindent.
>
> The plugin has code borrowed from kerneldoc.py which is
> *supposed* to handle the older API Sphinx 1.6 used, but it
> looks like it's broken. I'll have a look and see if it
> is fixable, but possibly we may have to live with people
> developing on old distros getting suboptimal errors.)

I'm okay with that.

>> Can we avoid the information loss?
>>
>> Can we make the error message format more consistent?
>
> How do I get a QAPIError from within one of the
> visit_* or freeform methods of a QAPISchemaVisitor ?

The visit_ methods take an info argument.  With that:

    raise QAPISemError(info, "the error message")

QAPISchemaGenDocVisitor.freeform() lacks such an argument.  Use
doc.info.

> The current texinfo doc generator doesn't do anything like that.

None of the code generators detects any errors.

I added one to test the advice I gave above.  It doesn't get caught.
Easy enough to fix: move then gen_FOO() into main()'s try ... except.

> If there's a way to get a QAPIError given
>  * the 'doc' argument passed into freeform/visit*
>  * the specific error message we want to emit
> then we can make the errors this script itself emits
> (which are just the ones about headings: wrongly-subnested
> headings, and headings that aren't the only thing in their
> freeform-doc block) consistent with the other QAPIError ones.
> [As I mentioned in some earlier thread, we are identifying
> these errors here for pragmatic "it was easy" reasons,
> though you could make a case that scripts/qapi/parser.py
> should be doing it.]

Oh yes.

> (Aside: what was your conclusion on the question of section
> headers, anyway? I need to go back and re-read the older
> threads but I think that is still an unresolved topic...)

If I remember correctly, I disliked "[PATCH 19/29]
qapi/qapi-schema.json: Put headers in their own doc-comment blocks",
because "it sweeps the actual problem under the rug, namely flaws in our
parsing and representation of doc comments."

Our documentation has a tree structure.  Ideally, we'd represent it as a
tree.  I figure such a tree would suit you well: you could translate
more or less 1:1 to Sphinx nodes.

We currently have a two-level list instead: a list of blocks (either
definition or free-form), where each block has a list of sections (see
QAPIDoc's doc string).

My commit dcdc07a97c "qapi: Make section headings start a new doc
comment block" was a minimally invasive patch to get us closer to a
(partially flattened) tree.  Kind of collects the dirt in an (hopefully
obvious) pile next to the rug.

You could finish the job and rework the representation into a tree.
That would be lovely.  I appreciate help with cleaning up existing QAPI
messes very much, but do not feel obliged.

You could work with the two-level list as is, relying on the fact that a
'=' can only occur right at the start of a free-form block.

You could tweak the two-level list just a bit, so that the heading
becomes properly represented, and you don't have to match free-form body
section text to find it.

Does this help you along?

> However I suspect we can't generate QAPIErrors for errors which are
> produced by Sphinx itself when it's handed invalid rST:
> the API Sphinx provides us is that we can annotate the lines
> of rST that we are assembling from the input files to indicate
> their "true" file/line number, but we can't intercept error
> messages it emits. Think of it like the C compiler -- a
> preprocessor can add #line directives to pass fixed up file/line
> info to the compiler, but it isn't in the loop when the compiler
> actually prints out errors.

Perfection is not a requirement :)

Errors from TexInfo are arguably worse than that now.  They are rare,
however.  The situation will change when people use reST markup in doc
comments.

>> I had to read mostly backwards to understand the code.
>
> Yes. This seems to be the standard way to write a Sphinx extension.
> (Compare the kerneldoc.py and the various example extensions
> in the Sphinx docs.)

Mimicking existing similar extensions makes sense.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]