qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] block/mirror: add 'write-blocking-after-ready' copy mode


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH] block/mirror: add 'write-blocking-after-ready' copy mode
Date: Tue, 14 Feb 2023 19:19:03 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 02.02.23 16:27, Fiona Ebner wrote:
Am 02.02.23 um 12:34 schrieb Kevin Wolf:
Am 02.02.2023 um 11:19 hat Fiona Ebner geschrieben:
Am 31.01.23 um 19:18 schrieb Denis V. Lunev:
Frankly speaking I would say that this switch could be considered
NOT QEMU job and we should just send a notification (event) for the
completion of the each iteration and management software should
take a decision to switch from async mode to the sync one.

My first thought was very similar. We should provide a building block
that just switches between the two modes and then the management tool
can decide what the right policy is.

Adding a new event when the first iteration is done (I'm not sure if
there is much value in having it for later iterations) makes sense to
me if someone wants to use it. If we add it, let's not forget that
events can be lost and clients must be able to query the same
information, so we'd have to add it to query-jobs, too - which in turn
requires adding a job type specific struct to JobInfo first.


Well, Denis said 2 iterations might be better. But I'm fine with
initially adding an event just for the first iteration, further ones can
still be added later. Returning the number of completed iterations as
part of the mirror-specific job info would anticipate that.

Once we have this generic infrastructure with low-level building block,
I wouldn't necessarily be opposed to having an option build on top where
QEMU automatically does what we consider most useful for most users.
auto-finalize/dismiss already do something similar.

Unfortunately, our management software is a bit limited in that regard
currently and making listening for events available in the necessary
place would take a bit of work. Having the switch command would nearly
be enough for us (we'd just switch after READY). But we'd also need
that when the switch happens after READY, that all remaining
asynchronous operations are finished by the command. Otherwise, the
original issue with inactivating block drives while mirror still has
work remains. Do those semantics for the switch sound acceptable?

Completing the remaining asynchronous operations can take a while, so I
don't think it's something to be done in a synchronous QMP command.
Do we need an event that tells you that the switch has completed?


Sure, makes sense. Since you said that an having an event implies that
there will be a possibility to query for the same information, yes ;)

What Denis suggested in the other mail also sounds good to me:

Am 02.02.23 um 12:09 schrieb Denis V. Lunev:
On 2/2/23 11:19, Fiona Ebner wrote:
Unfortunately, our management software is a bit limited in that regard
currently and making listening for events available in the necessary
place would take a bit of work. Having the switch command would nearly
be enough for us (we'd just switch after READY). But we'd also need that
when the switch happens after READY, that all remaining asynchronous
operations are finished by the command.
That could be a matter of the other event I believe. We switch mode and reset
the state. New READY event will be sent once the bitmap is cleared. That seems
fair.

That would avoid adding a new kind of event.

But having to switch the mirror job to sync mode just to avoid doing I/O
on an inactive device sounds wrong to me. It doesn't fix the root cause
of that problem, but just papers over it.

If you say the root cause is "the job not being completed before
switchover", then yes. But if the root cause is "switchover happening
while the drive is not actively synced", then a way to switch modes can
fix the root cause :)


Why does your management tool not complete the mirror job before it
does the migration switchover that inactivates images?

I did talk with my team leader about the possibility, but we decided to
not go for it, because it requires doing the migration in two steps with
pause-before-switchover and has the potential to increase guest downtime
quite a bit. So I went for this approach instead.



Interesting point. Maybe we need a way to automatically complete all the jobs 
before switchower?  It seems no reason to break the jobs if user didn't cancel 
them. (and of course no reason to allow a code path leading to assertion).

--
Best regards,
Vladimir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]