qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30


From: Peter Xu
Subject: Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Date: Wed, 21 Nov 2018 11:28:19 +0800
User-agent: Mutt/1.10.1 (2018-07-13)

On Tue, Nov 20, 2018 at 06:44:27PM +0100, Markus Armbruster wrote:
> Peter Xu <address@hidden> writes:
> 
> > On Mon, Nov 19, 2018 at 07:08:13PM +0100, Markus Armbruster wrote:
> >> Peter Xu <address@hidden> writes:
> >> 
> >> > On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
> >> >> I reproduced the error with a FreeBSD guest and this change (which
> >> >> possibly can be squashed into "tests: qmp-test: add queue full test")
> >> >> worked for me:
> >> >> 
> >> >> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> >> >> index 6989acbca4..83f353db4f 100644
> >> >> --- a/tests/qmp-test.c
> >> >> +++ b/tests/qmp-test.c
> >> >> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
> >> >>       * will only be able to be handled after the queue is shrinked, so
> >> >>       * it'll be processed only after one existing in-band command
> >> >>       * finishes.
> >> >> +     *
> >> >> +     * NOTE: we need to feed the queue with one extra request to make
> >> >> +     * sure it'll be stuck since when we have sent the Nth request
> >> >> +     * it's possible that we have already popped and processing the
> >> >> +     * 1st request so the Nth request (which could potentially be the
> >> >> +     * [N-1]th element on the queue) might not trigger the
> >> >> +     * monitor-full condition deterministically.
> >> >>       */
> >> >> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >> >>          send_cmd_that_blocks(qts, id);
> >> >>          g_free(id);
> >> >> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
> >> >>      unblock_blocked_cmd();
> >> >>      recv_cmd_id(qts, "queue-blocks-1");
> >> >>      recv_cmd_id(qts, "oob-1");
> >> >> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >>          unblock_blocked_cmd();
> >> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >> >>          recv_cmd_id(qts, id);
> >> >> 
> >> >> So the problem here is that the queue-block-N command might not really
> >> >> suspend the monitor everytime if we already popped the 1st request,
> >> >> which will let the N-th request to be (N-1)th, then the parser will
> >> >> continue to eat the oob command and it could "preempt" the previous
> >> >> commands.
> >> >> 
> >> >> Maybe FreeBSD is scheduling the threads in some pattern so it happens
> >> >> only on FreeBSD and very constantly, but anyway it should be a general
> >> >> fix to the test program.
> >> 
> >> I feel particularly dense right now, and need a more detailed analysis
> >> to understand.  Let me try.
> >> 
> >> Here's what the test does before your fix
> >> 
> >>     Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
> >>     queue-blocks-1, queue-blocks-2, ...  "Blocking" means the command
> >>     blocks until we explicitly unblock it.
> >> 
> >>     Send an out-of-band command oob-1.
> >> 
> >>     Unblock queue-blocks-1.
> >> 
> >>     Receive response to queue-blocks-1.
> >> 
> >>     Receive response to oob-1.
> >> 
> >>     Unblock queue-blocks-2, receive response
> >>     ...
> >> 
> >> Here's what test test expects QEMU to do:
> >> 
> >>     In I/O thread:
> >> 
> >>         Receive and queue queue-blocks-1, ...
> >> 
> >>         Queue is full, stop receiving commands.
> >> 
> >>     In main thread:
> >> 
> >>         Dequeue queue-blocks-1.
> >> 
> >>         Execute queue-blocks-1, block for a while, finish and send
> >>         response.
> >> 
> >>     In I/O thread:
> >> 
> >>         Queue is no longer full, resume receiving commands.
> >> 
> >>         Receive and execute oob-1, send response
> >> 
> >>     In main thread:
> >> 
> >>         Dequeue queue-blocks-2, execute, block for a while, finish and
> >>         send response.
> >>         ...
> >> 
> >> No good because the two threads race.  On my Linux box the test passes.
> >> Let's have a closer look how it does that.  The appended patch that adds
> >> another tracepoint.  With tracepoints monitor_* and handle_qmp_command
> >> enabled, I get:
> >> 
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-1", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-1.
> >> 
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
> >> 
> >> Main thread dequeues qemu-blocks-1 and starts executing it.
> >> 
> >> We already lost: the queue won't actually fill up.  The test passes
> >> anyway, but it doesn't actually test "queue full".  But why does it
> >> pass?  Let's see.
> >> 
> >>     address@hidden:monitor_qmp_respond queue-blocks-1
> >> 
> >> Main thread sends response to queue-blocks-1.  This means queue-blocks-1
> >> has been unblocked already.  This is possible, because qmp-test sends
> >> all the commands and then immediately unblocks queue-blocks-1, and the
> >> main thread can win the race with the I/O thread.
> >> 
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-2", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-2
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-3", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-4", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-5", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-6", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-7", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-8", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >> 
> >>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"exec-oob": 
> >> "migrate-pause"}
> >>     address@hidden:monitor_qmp_cmd_out_of_band oob-1
> >>     address@hidden:monitor_qmp_respond oob-1
> >> 
> >> I/O thread receives, executes and responds to oob-1.
> >> 
> >>     address@hidden:monitor_qmp_respond queue-blocks-2
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-3
> >>     address@hidden:monitor_qmp_respond queue-blocks-3
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-4
> >>     address@hidden:monitor_qmp_respond queue-blocks-4
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-5
> >>     address@hidden:monitor_qmp_respond queue-blocks-5
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-6
> >>     address@hidden:monitor_qmp_respond queue-blocks-6
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-7
> >>     address@hidden:monitor_qmp_respond queue-blocks-7
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-8
> >>     address@hidden:monitor_qmp_respond queue-blocks-8
> >> 
> >> Main thread dequeues, executes and responds to queue-blocks-2,
> >> ... queue-blocks-8.
> >> 
> >> If I make the main thread lose the race by sticking a sleep(1) before
> >> the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:
> >> 
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-1", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-1.
> >> 
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
> >> 
> >> Main thread dequeues qemu-blocks-1 and starts executing it.  We already
> >> lost.
> >> 
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-2", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-3", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-4", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-5", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-6", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-7", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-8", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >> 
> >>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"exec-oob": 
> >> "migrate-pause"}
> >>     address@hidden:monitor_qmp_cmd_out_of_band oob-1
> >>     address@hidden:monitor_qmp_respond oob-1
> >> 
> >> I/O thread receives, executes and responds to oob-1.
> >> 
> >> Test is going to fail: response to oob-1 sent before response to
> >> queue-blocks-1.
> >> 
> >> Now let me try your fix.  Here's what I see:
> >> 
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-1", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
> >>     address@hidden:monitor_qmp_respond queue-blocks-1
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-2", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-2
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-3", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-4", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-5", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-6", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-7", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-8", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
> >> "queue-blocks-9", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"exec-oob": 
> >> "migrate-pause"}
> >>     address@hidden:monitor_qmp_cmd_out_of_band oob-1
> >>     address@hidden:monitor_qmp_respond oob-1
> >>     address@hidden:monitor_qmp_respond queue-blocks-2
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-3
> >>     address@hidden:monitor_qmp_respond queue-blocks-3
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-4
> >>     address@hidden:monitor_qmp_respond queue-blocks-4
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-5
> >>     address@hidden:monitor_qmp_respond queue-blocks-5
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-6
> >>     address@hidden:monitor_qmp_respond queue-blocks-6
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-7
> >>     address@hidden:monitor_qmp_respond queue-blocks-7
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-8
> >>     address@hidden:monitor_qmp_respond queue-blocks-8
> >>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-9
> >>     address@hidden:monitor_qmp_respond queue-blocks-9
> >> 
> >> Tracepoint monitor_suspend still not hit.  This means the queue does not
> >> fill up.
> >
> > You are right.  I think I figured out one thing but I missed another
> > point here that "sent the command" does not mean "received by the QMP
> > server": we can never identify when the queue is full, meanwhile
> > that's what we need to write a unit test for it.
> >
> > For example: in this test what we want to achieve is:
> >
> >   for i in 1 .. N:
> >       send blocking commands Bi
> >   ... wait until queue is full ...  [1]
> >   send oob command O1
> >   unblock blocking command B1
> >   receive response B1
> >   receive response O1
> >   for i in 2 .. N:
> >       unblock blocking command Bi
> >       receive response Bi
> >
> > All looks sane except the assumption at [1].  In the code we didn't do
> > anything for [1] assuming that it's naturally happening.  However it's
> > not since when finishing the first for loop we only queued the
> > commands into the send buffer, but we can't guarantee that the
> > commands are well received and further queued in the qmp request list
> > on the server side.  Now when the server qmp request queue gets full
> > it will only suspend the monitor, and that event is totally unseen
> > from the client POV, so IMHO we don't have a good way to guarantee
> > this.  E.g., we can do a very long sleep(100) at [1] to pretty make
> > sure the queue full is happened, however it might still break on a
> > super slow machine.
> >
> > In other words, I don't see a good way to automate the testing of
> > queue full for now (but I'm pretty sure the queue full is working
> > there since if you really insert a sleep you'll definitely see the
> > monitor_suspend tracepoint).  We might need something like an event to
> > make sure the queue full happened and then the client can detect this
> > and insert the next out-of-band command, but then we're going backword
> > and I don't see much point to introduce a new event only for testing
> > purpose...
> 
> We want the test to drive QEMU into the "queue full" state with an
> out-of-band command unreceived, then watch the queue drain and the
> out-of-band command overtaking part of it.
> 
> Driving QEMU into this state is easy enough: we can use a blocking
> in-band command to plug the queue's drain.
> 
> Once we have QEMU in this state, we can unblock that command and watch
> the show.  But we really have to wait until QEMU reaches this state
> before we unblock.
> 
> Our problem is the test can't easily observe when QEMU reaches this
> state.
> 
> You mentioned sending a "queue full" event, and that adding such an
> event just for testing is hard to justify.  I agree.  Could such an
> event have a non-testing use?  All I can think of is letting the
> management application detect "we mismanaged the queue and compromised
> monitor availability for out-of-band commands".  Doesn't sound
> particularly convincing to me, but we can talk with libvirt guys to see
> whether they want it.
> 
> Can the test observe the monitor is suspended just from the behavior of
> its monitor socket?  Not reliably: sending lots of input eventually
> fails with EAGAIN when the monitor is suspended, but it can also fail
> when the monitor reads the input too slowly.
> 
> Can the test observe the monitor_suspend trace point?  Awkward, because
> the trace backends are configurable.  We could have the test monitor the
> "simple" backend's output file, skipping the test when "simple" isn't
> available.  Hmm.
> 
> Here comes the heavy hammer: extend the qtest protocol.  When the
> monitor suspends / resumes, send a suitable asynchronous message to the
> qtest socket.  Have libqtest use that to track monitor suspend state.
> The test can then busy-wait for the state to flip to suspended.
> 
> Other ideas?

I don't have more to add (already a lot of ideas! :-).  I'd say I
would prefer to start with the queue full event and we can discuss
with libvirt on that after 3.1 when proper.  And besides libvirt
(which I am not quite sure whether this event could be really useful
in the near future) maybe it can also be used as a hint for any new
QMP client when people want to complain about "hey, why my QMP client
didn't get any respond from QEMU" - if the client didn't get any
response but however it sees this event then we know the answer even
before debugging more.

> 
> >> > Markus, do you want me to repost a new version with this change?  Is
> >> > it still possible to have the oob-default series for 3.1?
> >> 
> >> We're trying to, but we need to get the test to work.
> >> 
> >> Two problems:
> >> 
> >> 1. The test doesn't seem to succed at testing "queue full".
> >> 
> >> 2. The test might still be racy.
> >> 
> >> I suspect the code we test is solid, and it's "only" the test we screwed
> >> up.
> >
> > I agree.  Do you have better idea to test queue full?  I'd be more
> > than glad to try any suggestions here, otherwise I am thinking whether
> > we can just simply drop the queue full test (which is the last patch
> > of the oob series) for now but merge the rest.  We can introduce new
> > queue full test after 3.1 if we want, though again I really doubt
> > whether we can without new things introduced into the procedure.
> >
> > Thoughts?
> 
> I'm afraid dropping the test (for now?) is what we have to do.

Agreed.

Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]