[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30 |
Date: |
Wed, 21 Nov 2018 11:28:19 +0800 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Tue, Nov 20, 2018 at 06:44:27PM +0100, Markus Armbruster wrote:
> Peter Xu <address@hidden> writes:
>
> > On Mon, Nov 19, 2018 at 07:08:13PM +0100, Markus Armbruster wrote:
> >> Peter Xu <address@hidden> writes:
> >>
> >> > On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
> >> >> I reproduced the error with a FreeBSD guest and this change (which
> >> >> possibly can be squashed into "tests: qmp-test: add queue full test")
> >> >> worked for me:
> >> >>
> >> >> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> >> >> index 6989acbca4..83f353db4f 100644
> >> >> --- a/tests/qmp-test.c
> >> >> +++ b/tests/qmp-test.c
> >> >> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
> >> >> * will only be able to be handled after the queue is shrinked, so
> >> >> * it'll be processed only after one existing in-band command
> >> >> * finishes.
> >> >> + *
> >> >> + * NOTE: we need to feed the queue with one extra request to make
> >> >> + * sure it'll be stuck since when we have sent the Nth request
> >> >> + * it's possible that we have already popped and processing the
> >> >> + * 1st request so the Nth request (which could potentially be the
> >> >> + * [N-1]th element on the queue) might not trigger the
> >> >> + * monitor-full condition deterministically.
> >> >> */
> >> >> - for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> + for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >> id = g_strdup_printf("queue-blocks-%d", i);
> >> >> send_cmd_that_blocks(qts, id);
> >> >> g_free(id);
> >> >> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
> >> >> unblock_blocked_cmd();
> >> >> recv_cmd_id(qts, "queue-blocks-1");
> >> >> recv_cmd_id(qts, "oob-1");
> >> >> - for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> + for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >> unblock_blocked_cmd();
> >> >> id = g_strdup_printf("queue-blocks-%d", i);
> >> >> recv_cmd_id(qts, id);
> >> >>
> >> >> So the problem here is that the queue-block-N command might not really
> >> >> suspend the monitor everytime if we already popped the 1st request,
> >> >> which will let the N-th request to be (N-1)th, then the parser will
> >> >> continue to eat the oob command and it could "preempt" the previous
> >> >> commands.
> >> >>
> >> >> Maybe FreeBSD is scheduling the threads in some pattern so it happens
> >> >> only on FreeBSD and very constantly, but anyway it should be a general
> >> >> fix to the test program.
> >>
> >> I feel particularly dense right now, and need a more detailed analysis
> >> to understand. Let me try.
> >>
> >> Here's what the test does before your fix
> >>
> >> Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
> >> queue-blocks-1, queue-blocks-2, ... "Blocking" means the command
> >> blocks until we explicitly unblock it.
> >>
> >> Send an out-of-band command oob-1.
> >>
> >> Unblock queue-blocks-1.
> >>
> >> Receive response to queue-blocks-1.
> >>
> >> Receive response to oob-1.
> >>
> >> Unblock queue-blocks-2, receive response
> >> ...
> >>
> >> Here's what test test expects QEMU to do:
> >>
> >> In I/O thread:
> >>
> >> Receive and queue queue-blocks-1, ...
> >>
> >> Queue is full, stop receiving commands.
> >>
> >> In main thread:
> >>
> >> Dequeue queue-blocks-1.
> >>
> >> Execute queue-blocks-1, block for a while, finish and send
> >> response.
> >>
> >> In I/O thread:
> >>
> >> Queue is no longer full, resume receiving commands.
> >>
> >> Receive and execute oob-1, send response
> >>
> >> In main thread:
> >>
> >> Dequeue queue-blocks-2, execute, block for a while, finish and
> >> send response.
> >> ...
> >>
> >> No good because the two threads race. On my Linux box the test passes.
> >> Let's have a closer look how it does that. The appended patch that adds
> >> another tracepoint. With tracepoints monitor_* and handle_qmp_command
> >> enabled, I get:
> >>
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-1", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>
> >> I/O thread receives and queues queue-blocks-1.
> >>
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
> >>
> >> Main thread dequeues qemu-blocks-1 and starts executing it.
> >>
> >> We already lost: the queue won't actually fill up. The test passes
> >> anyway, but it doesn't actually test "queue full". But why does it
> >> pass? Let's see.
> >>
> >> address@hidden:monitor_qmp_respond queue-blocks-1
> >>
> >> Main thread sends response to queue-blocks-1. This means queue-blocks-1
> >> has been unblocked already. This is possible, because qmp-test sends
> >> all the commands and then immediately unblocks queue-blocks-1, and the
> >> main thread can win the race with the I/O thread.
> >>
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-2", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-2
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-3", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-4", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-5", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-6", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-7", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-8", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >>
> >> address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"exec-oob":
> >> "migrate-pause"}
> >> address@hidden:monitor_qmp_cmd_out_of_band oob-1
> >> address@hidden:monitor_qmp_respond oob-1
> >>
> >> I/O thread receives, executes and responds to oob-1.
> >>
> >> address@hidden:monitor_qmp_respond queue-blocks-2
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-3
> >> address@hidden:monitor_qmp_respond queue-blocks-3
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-4
> >> address@hidden:monitor_qmp_respond queue-blocks-4
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-5
> >> address@hidden:monitor_qmp_respond queue-blocks-5
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-6
> >> address@hidden:monitor_qmp_respond queue-blocks-6
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-7
> >> address@hidden:monitor_qmp_respond queue-blocks-7
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-8
> >> address@hidden:monitor_qmp_respond queue-blocks-8
> >>
> >> Main thread dequeues, executes and responds to queue-blocks-2,
> >> ... queue-blocks-8.
> >>
> >> If I make the main thread lose the race by sticking a sleep(1) before
> >> the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:
> >>
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-1", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>
> >> I/O thread receives and queues queue-blocks-1.
> >>
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
> >>
> >> Main thread dequeues qemu-blocks-1 and starts executing it. We already
> >> lost.
> >>
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-2", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-3", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-4", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-5", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-6", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-7", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-8", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >>
> >> address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"exec-oob":
> >> "migrate-pause"}
> >> address@hidden:monitor_qmp_cmd_out_of_band oob-1
> >> address@hidden:monitor_qmp_respond oob-1
> >>
> >> I/O thread receives, executes and responds to oob-1.
> >>
> >> Test is going to fail: response to oob-1 sent before response to
> >> queue-blocks-1.
> >>
> >> Now let me try your fix. Here's what I see:
> >>
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-1", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
> >> address@hidden:monitor_qmp_respond queue-blocks-1
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-2", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-2
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-3", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-4", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-5", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-6", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-7", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-8", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute":
> >> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name":
> >> "queue-blocks-9", "driver": "blkdebug", "config":
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >> address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"exec-oob":
> >> "migrate-pause"}
> >> address@hidden:monitor_qmp_cmd_out_of_band oob-1
> >> address@hidden:monitor_qmp_respond oob-1
> >> address@hidden:monitor_qmp_respond queue-blocks-2
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-3
> >> address@hidden:monitor_qmp_respond queue-blocks-3
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-4
> >> address@hidden:monitor_qmp_respond queue-blocks-4
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-5
> >> address@hidden:monitor_qmp_respond queue-blocks-5
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-6
> >> address@hidden:monitor_qmp_respond queue-blocks-6
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-7
> >> address@hidden:monitor_qmp_respond queue-blocks-7
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-8
> >> address@hidden:monitor_qmp_respond queue-blocks-8
> >> address@hidden:monitor_qmp_cmd_in_band queue-blocks-9
> >> address@hidden:monitor_qmp_respond queue-blocks-9
> >>
> >> Tracepoint monitor_suspend still not hit. This means the queue does not
> >> fill up.
> >
> > You are right. I think I figured out one thing but I missed another
> > point here that "sent the command" does not mean "received by the QMP
> > server": we can never identify when the queue is full, meanwhile
> > that's what we need to write a unit test for it.
> >
> > For example: in this test what we want to achieve is:
> >
> > for i in 1 .. N:
> > send blocking commands Bi
> > ... wait until queue is full ... [1]
> > send oob command O1
> > unblock blocking command B1
> > receive response B1
> > receive response O1
> > for i in 2 .. N:
> > unblock blocking command Bi
> > receive response Bi
> >
> > All looks sane except the assumption at [1]. In the code we didn't do
> > anything for [1] assuming that it's naturally happening. However it's
> > not since when finishing the first for loop we only queued the
> > commands into the send buffer, but we can't guarantee that the
> > commands are well received and further queued in the qmp request list
> > on the server side. Now when the server qmp request queue gets full
> > it will only suspend the monitor, and that event is totally unseen
> > from the client POV, so IMHO we don't have a good way to guarantee
> > this. E.g., we can do a very long sleep(100) at [1] to pretty make
> > sure the queue full is happened, however it might still break on a
> > super slow machine.
> >
> > In other words, I don't see a good way to automate the testing of
> > queue full for now (but I'm pretty sure the queue full is working
> > there since if you really insert a sleep you'll definitely see the
> > monitor_suspend tracepoint). We might need something like an event to
> > make sure the queue full happened and then the client can detect this
> > and insert the next out-of-band command, but then we're going backword
> > and I don't see much point to introduce a new event only for testing
> > purpose...
>
> We want the test to drive QEMU into the "queue full" state with an
> out-of-band command unreceived, then watch the queue drain and the
> out-of-band command overtaking part of it.
>
> Driving QEMU into this state is easy enough: we can use a blocking
> in-band command to plug the queue's drain.
>
> Once we have QEMU in this state, we can unblock that command and watch
> the show. But we really have to wait until QEMU reaches this state
> before we unblock.
>
> Our problem is the test can't easily observe when QEMU reaches this
> state.
>
> You mentioned sending a "queue full" event, and that adding such an
> event just for testing is hard to justify. I agree. Could such an
> event have a non-testing use? All I can think of is letting the
> management application detect "we mismanaged the queue and compromised
> monitor availability for out-of-band commands". Doesn't sound
> particularly convincing to me, but we can talk with libvirt guys to see
> whether they want it.
>
> Can the test observe the monitor is suspended just from the behavior of
> its monitor socket? Not reliably: sending lots of input eventually
> fails with EAGAIN when the monitor is suspended, but it can also fail
> when the monitor reads the input too slowly.
>
> Can the test observe the monitor_suspend trace point? Awkward, because
> the trace backends are configurable. We could have the test monitor the
> "simple" backend's output file, skipping the test when "simple" isn't
> available. Hmm.
>
> Here comes the heavy hammer: extend the qtest protocol. When the
> monitor suspends / resumes, send a suitable asynchronous message to the
> qtest socket. Have libqtest use that to track monitor suspend state.
> The test can then busy-wait for the state to flip to suspended.
>
> Other ideas?
I don't have more to add (already a lot of ideas! :-). I'd say I
would prefer to start with the queue full event and we can discuss
with libvirt on that after 3.1 when proper. And besides libvirt
(which I am not quite sure whether this event could be really useful
in the near future) maybe it can also be used as a hint for any new
QMP client when people want to complain about "hey, why my QMP client
didn't get any respond from QEMU" - if the client didn't get any
response but however it sees this event then we know the answer even
before debugging more.
>
> >> > Markus, do you want me to repost a new version with this change? Is
> >> > it still possible to have the oob-default series for 3.1?
> >>
> >> We're trying to, but we need to get the test to work.
> >>
> >> Two problems:
> >>
> >> 1. The test doesn't seem to succed at testing "queue full".
> >>
> >> 2. The test might still be racy.
> >>
> >> I suspect the code we test is solid, and it's "only" the test we screwed
> >> up.
> >
> > I agree. Do you have better idea to test queue full? I'd be more
> > than glad to try any suggestions here, otherwise I am thinking whether
> > we can just simply drop the queue full test (which is the last patch
> > of the oob series) for now but merge the rest. We can introduce new
> > queue full test after 3.1 if we want, though again I really doubt
> > whether we can without new things introduced into the procedure.
> >
> > Thoughts?
>
> I'm afraid dropping the test (for now?) is what we have to do.
Agreed.
Thanks,
--
Peter Xu
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, (continued)
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Markus Armbruster, 2018/11/06
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Peter Xu, 2018/11/06
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Peter Maydell, 2018/11/07
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Eric Blake, 2018/11/07
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Peter Xu, 2018/11/07
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Peter Xu, 2018/11/19
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Peter Xu, 2018/11/19
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Markus Armbruster, 2018/11/19
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Peter Xu, 2018/11/20
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Markus Armbruster, 2018/11/20
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30,
Peter Xu <=
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Markus Armbruster, 2018/11/21
- Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30, Markus Armbruster, 2018/11/27