qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete


From: Manos Pitsidianakis
Subject: Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
Date: Fri, 28 Jul 2017 11:49:53 +0300
User-agent: NeoMutt/20170609-57-1e93be (1.8.3)

On Thu, Jul 27, 2017 at 06:09:04PM -0400, John Snow wrote:


On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote:
On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
This proposal follows a discussion we had with Kevin and Stefan on filter
node management.

With block filter drivers arises a need to configure filter nodes on runtime with QMP on live graphs. A problem with doing live graph modifications is that some block jobs modify the graph when they are done and don't operate under any synchronisation, resulting in a race condition if we try to insert
a filter node in the place of an existing edge.

But maybe you are thinking about higher level race conditions between
QMP commands.  Can you give an example of the race?

Exactly, synchronisation between the QMP/user actions. Here's an example, correct me if I'm wrong:

User issues a block-commit to this tree to commit A onto B:
   BB -> A -> B

This inserts the dummy mirror node at the top since this is a mirror job (active commit)
   BB -> dummy -> A -> B

User wants to add a filter node F below the BB. First we create the node:
   BB -> dummy -> A -> B
      F /

Commit finishes before we do block-insert-node, dummy and A is removed from the BB's chain, mirror_exit calls bdrv_replace_node()

   BB ------------> B
               F -> /

Now inserting F under BB will error since dummy does not exist any more.


Exactly how much of a problem is this? You, the user, have instructed QEMU to do two conflicting things:

(1) Squash A down into B, and
(2) Insert a filter above A

Shame on you for deleting the node you were trying to reference, no? And as an added bonus, QEMU will even error out on your command instead of trying to do something and getting it wrong.

Is this a problem?



The proposal doesn't guard against the following:
User issues a block-commit to this tree to commit B onto C:
   BB -> A -> B -> C

The dummy node (commit's top bs is A):
   BB -> A -> dummy -> B -> C

blockdev-add of a filter we wish to eventually be between A and C:
   BB -> A -> dummy -> B -> C
            F/

if commit finishes before we issue block-insert-node (commit_complete()
calls bdrv_set_backing_hd() which only touches A)
   BB -> A --------------> C
          F -> dummy -> B /
   resulting in error when issuing block-insert-node,

Because "B" and "dummy" have already actually been deleted, I assume. (The diagram is confusing.)

because we have an edge (A, C) and C is not a child of F, so we cannot insert it as a child to A. (I rotated the usual ascii art by 90 deg to save me from handling whitespace, but I might have done it worse)

else if we set the job to manual-delete and insert F:
   BB -> A -> F -> dummy -> B -> C
deleting the job will result in F being lost since the job's top bs wasn't updated.


Seems like a fairly good reason not to pursue this avenue then, yes?

I think this is just a different problem, not a race but of broadcasting graph operations to affected parties (block jobs).

I think I agree with Stefan's concerns that this is more of a temporary workaround for the benefit of jobs, but that there may well be other reasons (especially in the future) where graph manipulations may occur invisibly to the user.

Do you have any use cases for failure here that don't involve the user themselves asking for two conflicting things? That is, QEMU doing some graph reorganization without the user's knowledge coupled with the user asking for a new filter node?

Are there any failures that result in anything more dangerous or bad than a "404: node not found" style error where we simply reject the premise of what the user was attempting to accomplish?


I couldn't come up with a case that caused catastrophic failure, no (though I lack familiarity with block jobs). If this isn't a problem after all, good news.

block-insert-node needs to work with no dangerous conflicts to other graph operations, basically. Is there a case where there's not just an error and we can't recover?

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]