qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->childr


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 30 Mar 2022 12:52:57 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

30.03.2022 12:09, Emanuele Giuseppe Esposito wrote:

Ah seems I understand what you mean.

One of my arguments is that "drain" - is not a lock against other
clients who want to modify the graph. Because, drained section allows
nested drained sections.

And you try to solve it, by draining more things, this way, we'll drain
also the job, which is a possible client, who may want to modify the
graph in parallel.

So, in other words, when we want to modify the graph, we drain the whole
connectivity component of the graph. And we think that we are safe from
other graph modifications because all related jobs are drained.
Interesting, is that possible that some not drained job from another
connectivity component will want to connect some node from our drained
component?

You mean another job or whathever calling bdrv_find_node() on a random
graph? Yes that is not protected. But can this happen?

That's the question. What are the invariants here? Can anything happen?


I just still feel that draining is a wrong mechanism to avoid
interaction with other clients who want to modify the graph, because:

1. we stop the whole IO on all subgraph which is not necessary
2. draining is not a mutex, it allows nesting and it's ok when two
different clients drain same nodes. Draining is just a requirement to do
no IO at these nodes.

And in your way, it seems that to be absolutely safe we'll need to drain
everything..

In my feeling it's better to keep draining what it is now: requirement
to have no IO requests. And to isolate graph modifications from each
other make a new synchronization mechanism, something like a global
queue, where clients who want to get an access to graph modifications
wait for their turn.

This is a matter of definitions. Subtree drains can theoretically work,
I managed to answer to my own doubts in the last email I sent.

Yes, there is still some completely random case like the one I wrote
above, but I think it is more a matter of what we want to use and what
meaning we want to give to drains.

Global queue is what Kevin proposes, I will try to implement it.



As I understand:

You want to make drained section to be a kind of lock, so that if we
take this lock, we can modify the graph and we are sure that no other
client will modify it in parallel.

Yes


But drained sections can be nested. So to solve the problem you try to
drain more nodes: include subtree for example, or may be we need to
drain the whole graph connectivity component, or (to be more safe) the
whole block layer (to be sure that during drained section in one
connectivity component some not-drained block-job from another
connectivity component will not try to attach some node from our drained
connectivity component)..

I still feel that draining is wrong tool for isolating graph modifying
operations from each other:

1. Drained sections can be nested, and natively that's not a kind of
lock. That's just a requirement to have no IO requests. There may be
several clients that want this condition on same set of nodes.

2. Blocking IO on the whole connected subgraph or even on the whole
block layer graph is not necessary, so that's an extra blocking.


Could we instead do the following:

1. Keep draining as is - a mechanism to stop IO on some nodes

2. To isolate graph-modifying operations implement another mechanism:
something like a global queue, where clients wait until they gen an
access to modify block layer.


This way, any graph modifying process would look like this:

1. drained_begin(only where necessary, not the whole subgraph in general)

2. wait in the global queue

3. Ok, now we can do all the modifications

4. Kick the global queue, so that next client will get an access

5. drained_end()



Please give a look at what Kevin (described by me) proposed. I think
it's the same as you are suggesting. I am pasting it below.
I will try to implement this and see if it is doable or not.

I think the advantage of drains is that it isn't so complicated and
doesn't add any complication to the existing code.
But we'll see how it goes with this global queue.

His idea is to replicate what blk_wait_while_drained() currently does
but on a larger scale. It is something in between this subtree_drains
logic and a rwlock.

Basically if I understood correctly, we could implement
bdrv_wait_while_drained(), and put in all places where we would put a
read lock: all the reads to ->parents and ->children.
This function detects if the bdrv is under drain, and if so it will stop
and wait that the drain finishes (ie the graph modification).
On the other side, each write would just need to drain probably both
nodes (simple drain), to signal that we are modifying the graph. Once
bdrv_drained_begin() finishes, we are sure all coroutines are stopped.
Once bdrv_drained_end() finishes, we automatically let all coroutine
restart, and continue where they left off.

Seems a good compromise between drains and rwlock. What do you think?

I am not sure how painful it will be to implement though.


Hm, I don't see, where is global queue here? Or bdrv_wait_while_drained() is 
global and has no bs arguement?




--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]