qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
Date: Wed, 15 Feb 2023 23:50:46 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 15.02.23 16:07, Markus Armbruster wrote:
Kevin Wolf <kwolf@redhat.com> writes:

Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben:
Drive-by comment...

Kevin Wolf <kwolf@redhat.com> writes:

This series addresses the problem described in these bug reports:
https://gitlab.com/qemu-project/qemu/-/issues/1330
https://bugzilla.redhat.com/show_bug.cgi?id=2147617

qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
However, when the function is called through blk_unref(), in the case of
such errors, while an error message is written to stderr, the callers
never see an error return. Specifically, 'qemu-img bitmap/commit' are
reported to exit with an exit code 0 despite the errors.

After having tead the "potential alternative" below, I figure this
failure happens within blk_unref().  But I can't see a call chain.  Am I
confused?

When I put an abort() into the error path:

#0  0x00007ffff6aa156c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff6a54d76 in raise () from /lib64/libc.so.6
#2  0x00007ffff6a287f3 in abort () from /lib64/libc.so.6
#3  0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at 
../block/qcow2.c:2705
#4  0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, 
close_data_file=true) at ../block/qcow2.c:2741
#5  0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at 
../block/qcow2.c:2770
#6  0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939
#7  0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330
#8  0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850
#9  0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at 
../block.c:3207
#10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at 
../block/block-backend.c:895
#11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at 
../block/block-backend.c:479
#12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at 
../block/block-backend.c:537
#13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at 
../qemu-img.c:4820
#14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at 
../qemu-img.c:5450

Thanks!

The solution taken here is inactivating the images first, which can
still return errors, but already performs all of the write operations.
Only then the images are actually blk_unref()-ed.

If we agree that this is the way to go (as a potential alternative,
allowing blk_unref() to fail would require changes in all kinds of
places, many of which probably wouldn't even know what to do with the
error),

blk_unref() could fail only when it destroys @blk (refcnt goes to zero).
Correct?

I think so, yes.

Thanks again!

We have a bunch of "unref" functions in the tree, and, as far as I can
tell from a quick grep, none of them can fail.  Supports your apparent
preference for not changing blk_unref().

         then I suppose doing the same for other qemu-img subcommands
would make sense, too.

I was about to ask whether there could be more silent failures like the
ones in commit and bitmap.  This suggests there are.

Say we do the same for all known such failures.  Would any remaining (or
new) such failures be programming errors?

Let's be honest: What I'm proposing here is not pretty and not a full
solution, it only covers the simplest part of the problem, which happens
to be the part that has shown up in practice.

If you have a good idea how to solve the general problem, I'm all ears.

I haven't checked other qemu-img subcommands, but I don't see why they
wouldn't be able to run into an error in .bdrv_close. They could be
fixed the same way.

The next level in difficulty might be QMP block-delete. It's still easy
because like in qemu-img, we know that we're freeing the last reference,
and so we could actually do the same here. Well, more or less the same
at least: Obviously not inactivate_all(), but just for a single node. We
also need to do this recursively for children, except only for those
that would actually go away together with our parent node and aren't
referenced elsewhere. Even if we manage to implement this correctly,
what do we do with the error? Would returning a QMP error imply that we
didn't actually close the image and it's still valid (and not
inactivated)?

Too easy? Let's make it a bit harder. Let's say a commit job completes
and we're now removing the intermediate nodes. One of these images could
in theory fail in .bdrv_close. We have successfully committed the data,
the new graph is ready and in good state. Just one of the old images
we're throwing out runs into ENOSPC in its .bdrv_close. Where do we
report that error? We don't even necessarily have a QMP command here, we
could only let the whole block job fail, which is probably not a good
way to let libvirt know what was happening. Also, we can't just
unconditionally inactivate the image beforehand there, it might still be
in use by other references.  Which may actually be dropped while we're
draining the node in bdrv_close().

Not enough headaches yet? There are plenty of places in QEMU that just
want to make sure that the node doesn't go away while they are still
doing something with it. So they use a bdrv_ref/unref pair locally.
These places could end up freeing the last reference if the node would
have gone away otherwise. They are almost certainly a very confusing
place to report the error. They might not even be places that can return
errors at all currently.

Yes.

So the main reason why I'm not doing this properly by returning the
errors from qcow2_close() (and .bdrv_close in all other drivers) through
bdrv_unref() down to the callers of that is not only that it would be a
major conversion that would touch lots of places, but also that I
wouldn't even know what to do with the error in most callers. And that
I'm not sure what the semantics of an error in a close function should
be.

Understand.

Another thing that could be tried is making failure in .bdrv_close less
likely by doing things earlier. At least ENOSPC could probably be
avoided if dirty bitmaps clusters were allocated during the write
request that first sets a bit in them (I know too little about the
details how bitmaps are implemented in qcow2, though, maybe Vladimir can
help here). But ultimately, you'll always get some I/O requests in
.bdrv_close and they could fail even if we made it less likely.

Let me try to summarize to make sure I understand.

Closing an image can fail for the same reason close() can fail: flushing
caches can fail, and not caching is not an option.

The close is commonly hidden within a bdrv_unref().  It closes when the
last reference goes away.

Sometimes we know which bdrv_unref() will close.  Sometimes we don't.

Some bdrv_unref() callers can report errors sanely.  Others simply
can't.

Some failures to close can be safely ignored, such as closing a
temporary image that is going away anyway.  But it's hard to tell when
this is the case.

Ideally, things fail cleanly: we either do what's asked and succeed, or
do nothing and fail.  A failure to close is commonly unclean.  So, even
if we can report it, recovery can be hard or impossible.


A common criticism of garbage collection is that finalization is delayed
and runs "out of context".  The above shows that reference counting
isn't all that better.

We could have two variants of bdrv_unref(), one that must not fail, and
one that can fail and must be checked.  But as you explained, ensuring
failure only happens in places where we can handle an error sanely is
somewhere between hard and impossible.

No better ideas, I'm afraid.



I just add my thought:

If user worries about correct closing of any block node, user should control 
when the node is closed. And we have all the instruments:

User just should create each node personally by blockdev-add (or -blockdev). 
And then before terminating QEMU process do blockdev-del correspondingly for 
all nodes. This way the moment when the node is finally closed is obvious: it's 
blockdev-del, where user can get appropriate error message (and retry deletion 
if needed).

To achieve this we need similar additional bdrv_inactivate() call in qmp_blockdev_del() 
to report an error (and don't bdrv_unref() in case of failed inactivation). And we can 
add a boolean "force" argument to blcokdev-del to skip this additional 
bdrv_inactivate() call.

(yes, this will not work for qemu-img. But qsd may be used instead to create 
more reliable scenarios)

--
Best regards,
Vladimir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]