qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] backup bug or question


From: John Snow
Subject: Re: [Qemu-devel] backup bug or question
Date: Fri, 9 Aug 2019 16:13:11 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0


On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> Hmm, hacking around backup I have a question:
> 
> What prevents guest write request after job_start but before setting
> write notifier?
> 
> code path:
> 
> qmp_drive_backup or transaction with backup
> 
>     job_start
>        aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? 
> */
> 
> ....
> 
> job_co_entry
>     job_pause_point() /* it definitely yields, isn't it bad? */
>     job->driver->run() /* backup_run */
> 
> ----
> 
> backup_run()
>     bdrv_add_before_write_notifier()
> 
> ...
> 

I think you're right... :(


We create jobs like this:

job->paused        = true;
job->pause_count   = 1;


And then job_start does this:

job->co = qemu_coroutine_create(job_co_entry, job);
job->pause_count--;
job->busy = true;
job->paused = false;


Which means that job_co_entry is being called before we lift the pause:

assert(job && job->driver && job->driver->run);
job_pause_point(job);
job->ret = job->driver->run(job, &job->err);

...Which means that we are definitely yielding in job_pause_point.

Yeah, that's a race condition waiting to happen.

> And what guarantees we give to the user? Is it guaranteed that write notifier 
> is
> set when qmp command returns?
> 
> And I guess, if we start several backups in a transaction it should be 
> guaranteed
> that the set of backups is consistent and correspond to one point in time...
> 

I would have hoped that maybe the drain_all coupled with the individual
jobs taking drain_start and drain_end would save us, but I guess we
simply don't have a guarantee that all backup jobs WILL have installed
their handler by the time the transaction ends.

Or, if there is that guarantee, I don't know what provides it, so I
think we shouldn't count on it accidentally working anymore.



I think we should do two things:

1. Move the handler installation to creation time.
2. Modify backup_before_write_notify to return without invoking
backup_do_cow if the job isn't started yet.

I'll send a patch in just a moment ...

--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]