[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in

From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185
Date: Thu, 10 May 2018 11:05:37 +0100
User-agent: Mutt/1.9.3 (2018-01-21)

On Tue, May 08, 2018 at 09:26:03AM -0500, Eric Blake wrote:
> On 05/08/2018 08:54 AM, Stefan Hajnoczi wrote:
> > Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
> > 185") identified a race condition in a sub-test.
> > 
> > Similar issues also affect the other sub-tests.  If disk I/O completes
> > quickly, it races with the QMP 'quit' command.  This causes spurious
> > test failures because QMP events are emitted in an unpredictable order.
> > 
> > This test relies on QEMU internals and there is no QMP API for getting
> > deterministic behavior needed to make this test 100% reliable.  At the
> > same time, the test is useful and it would be a shame to remove it.
> > 
> > Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
> > appears to reduce spurious failures in practice.
> > 
> > Cc: Vladimir Sementsov-Ogievskiy <address@hidden>
> > Signed-off-by: Stefan Hajnoczi <address@hidden>
> > ---
> >   tests/qemu-iotests/185 | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> I'm not opposed to this patch, but is there any way to write the test to
> take both events in either order, without logging the events as they arrive,
> but instead summarizing in a deterministic order which events were received
> after the fact?  That way, no matter which way the race is won, we merely
> log that we got two expected events, and could avoid the extra sleep.

I don't think there is a practical way of doing that without big changes
to the test.  It could be rewritten in Python to make filtering the QMP
events easier.

Hiding the race doesn't solve the deeper problem though: the test case
doesn't exercise the same code path each time.  The test should really
cover all cancellation points in the block job lifecycle instead of just
one at random.  If we solve this problem then we don't need to filter
the QMP event sequence.

Maybe it can be done with blkdebug.  If not then maybe a blockjobdbg
interface is necessary to perform deterministic tests (eliminating the
need for the ratelimiting trick used by this test!).

Please share ideas, but I think this is a long-term item that shouldn't
block this series.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]