qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
Date: Thu, 28 May 2015 11:40:47 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 28.05.2015 um 04:49 hat Fam Zheng geschrieben:
> On Wed, 05/27 12:43, Paolo Bonzini wrote:
> > 
> > 
> > On 27/05/2015 12:10, Kevin Wolf wrote:
> > > Am 27.05.2015 um 11:50 hat Paolo Bonzini geschrieben:
> > >>
> > >>
> > >> On 27/05/2015 11:07, Kevin Wolf wrote:
> > >>> This is the first part of what's troubling me with this series, as it
> > >>> makes me doubtful if op blockers are the right tool to implement what
> > >>> the commit message says (block device I/O). This is "are we doing the
> > >>> thing right?"
> > >>>
> > >>> The second part should actually come first, though: "Are we doing the
> > >>> right thing?" I'm also doubtful whether blocking device I/O is even what
> > >>> we should do.
> > >>>
> > >>> Why is device I/O special compared to block job I/O or NBD server I/O?
> > >>
> > >> Because block job I/O doesn't modify the source disk.  For the target
> > >> disk, block jobs should definitely treat themselves as device I/O and
> > >> register notifiers that pause themselves on bdrv_drain.
> > > 
> > > Block jobs don't generally have a source and a target; in fact, the
> > > design didn't even consider that such jobs exist (otherwise, we wouldn't
> > > have bs->job).
> > 
> > Mirror and backup have targets.
> > 
> > > There are some jobs that use a BDS read-only (source for backup, mirror,
> > > commit) just like there are guest devices that use the BDS read-only
> > > (CD-ROMs). And others write to it (streaming, hard disks).
> > 
> > Streaming doesn't write to the BDS.  It reads it while copy-on-read is
> > active.  It's subtle, but it means that streaming can proceed without
> > changing the contents of the disk.  As far as safety of atomic
> > transactions is concerned, streaming need not be paused.
> > 
> > However, you're right that there is no particular reason why a more
> > generic pause/resume wouldn't pause jobs as well.  It wouldn't be a
> > problem either way as far as the job is concerned, and a better-defined
> > API is a better API.
> > 
> > >> This is suspiciously similar to the first idea that I and Stefan had,
> > >> which was a blk_pause/blk_resume API, implemented through a notifier 
> > >> list.
> > > 
> > > Any problems with it because of which Fam decided against it?
> 
> I am not against it.
> 
> Initially I started with writing blk_pause/resume, and soon I though it would
> be useful if blk_aio_read and friends can check the pause state before issuing
> IO:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02608.html
> 
> So I reused op blockers. But the idea was later abandoned, so it's now very
> similar to the blk_pause/resume idea.
> 
> BTW, I remember we once proposed the new op blocker model should have a
> category for I/O, no?

Yes, but I was always thinking of this on a much higher level: As soon
as a device is attached to a backend, it announces that it uses the
category, and it only drops it again when it's detached.

The use case for this is starting background jobs that don't work e.g.
while the image is written to. In such cases you want to block if the
attached device _can_ submit write request, not just if a write request
is already in flight.

Mutual exclusion on a per-request basis is something different.

> On a second thought, although in this series, we only need to modify
> virtio-{scsi,blk}, Wen pointed out that mirror job also wants this during
> completion (because bdrv_swap is in a BH after the job coroutine returns).  
> So,
> in order for the solution to be general, and complete, this nofitier list
> approach relies on the listening devices to do the right thing, which requires
> modifying all of them, and is harder to maintain.

Indeed. blk_pause/resume would handle everything in one central place
in the block layer instead of spreading the logic across all the block
layer users.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]