qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2.5 v5 0/11] dataplane snapshot fixes


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH 2.5 v5 0/11] dataplane snapshot fixes
Date: Fri, 6 Nov 2015 17:29:59 +0000

On Fri, Nov 6, 2015 at 4:19 PM, Denis V. Lunev <address@hidden> wrote:
> On 11/06/2015 07:05 PM, Eric Blake wrote:
>>
>> On 11/06/2015 08:54 AM, Stefan Hajnoczi wrote:
>>>
>>> On Wed, Nov 04, 2015 at 08:19:31PM +0300, Denis V. Lunev wrote:
>>>>
>>>> with test
>>>>      while /bin/true ; do
>>>>          virsh snapshot-create rhel7
>>>>          sleep 10
>>>>          virsh snapshot-delete rhel7 --current
>>>>      done
>>>> with enabled iothreads on a running VM leads to a lot of troubles:
>>>> hangs,
>>>> asserts, errors.
>>
>> That is a case of using libvirt to trigger internal snapshots...
>>
>>> The HMP monitor is legacy and also not used by modern libvirt.
>>
>> ...and libvirt is forced to use HMP for internal snapshots, since we
>> _still_ haven't exposed internal snapshots as a QMP command.
>>
>>> I think the affected use cases are restricted to savevm+dataplane and
>>> HMP+dataplane.
>>
>> The fact that the commit message calls out a libvirt method of
>> triggering the bug does mean that it is user-visible, and so it would
>> qualify as a bug fix even during hard freeze.  But I also understand
>> that taking a large complex series late in the game is not without risk;
>> and it is not like this is a regression (rather, something that has
>> never worked bulletproof), right?
>>
> yes, this was not working in the past and this is not a regression.
>
> The problem is that it seems that NOBODY uses iothreads in the
> production or even for complex real life production tests. There
> is another recently merged example of this (100% reproducible,
> happens both on migration/snapshot). We have faced this on
> suspend operation.
>
> commit 10a06fd65f667a972848ebbbcac11bdba931b544
> Author: Pavel Butsykin <address@hidden>
> Date:   Mon Oct 26 14:42:57 2015 +0300
>
>     virtio: sync the dataplane vring state to the virtqueue before
> virtio_save
>
> I have started this initially as a set of small bits in savevm code
> and was asked to move the code from savevm.c to block layer.
> This has been done and yes, series becomes complex after
> that and it was obvious that it will be complex when the task
> was set to move a bunch of code from one place to another.
>
> Anyway, from my point of view the serie is not that complex.
> It is just large and is doing simple things almost near copy/paste
> and there is a month to catch bugs here.
>
> Can we still consider this for merge?

Absolutely, they are still bugs and we can fix them for 2.5.

I just wanted to reflect on the scope of the bugs and it occurred to
me that these code paths haven't been exercised/tested as often.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]