qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Possible bug in monitor code


From: Stratos Psomadakis
Subject: Re: [Qemu-devel] Possible bug in monitor code
Date: Fri, 24 Jan 2014 12:14:12 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

On 01/23/2014 08:28 PM, Luiz Capitulino wrote:
> On Thu, 23 Jan 2014 17:33:33 +0200
> Stratos Psomadakis <address@hidden> wrote:
>
>> On 01/23/2014 03:54 PM, Luiz Capitulino wrote:
>>> On Thu, 23 Jan 2014 08:44:02 -0500
>>> Luiz Capitulino <address@hidden> wrote:
>>>
>>>> On Thu, 23 Jan 2014 19:23:51 +0800
>>>> Fam Zheng <address@hidden> wrote:
>>>>
>>>>> Bcc: 
>>>>> Subject: Re: [Qemu-devel] Possible bug in monitor code
>>>>> Reply-To: 
>>>>> In-Reply-To: <address@hidden>
>>>>>
>>>>> On Thu, 01/23 12:17, Stratos Psomadakis wrote:
>>>>>> On 01/23/2014 05:07 AM, Fam Zheng wrote:
>>>>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) 
>>>>>>>> behavior
>>>>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the 
>>>>>>>> issue:
>>>>>>>>
>>>>>>>>     1) Client A connects to qmp socket with socat
>>>>>>>>     2) Client A gets greeting message {"QMP": {"version": ..}
>>>>>>>>     3) Client A waits (select on the socket's fd)
>>>>>>>>     4) Client B tries to connect to the *same* qmp socket with socat
>>>> QMP/HMP can only handle a single client per connection, so this is
>>>> not supported. What you could do is to create multiple QMP/HMP instances
>>>> on the command-line, but this has never been fully tested so we don't
>>>> officially support this either (although it may work).
>>>>
>>>> Now, not gracefully failing on step 4 is a real bug here. I think the
>>>> best thing to do would be to close client's B connection. Patches are
>>>> welcome :)
>>> Thinking about this again, I think this should already be the case
>>> (as I don't believe chardev code is sitting on accept()). So maybe
>>> you triggered a race on chardev code or broke something there.
>> Yes, the chardev code doesn't accept any more connections until the
>> currently active connection is closed.
>>
>> However, when a new client tries to connect (while there's another qmp /
>> hmp connection active) the kernel, as long as there's room in the socket
>> backlog, will return to the client that the connection has been
>> successfully established and will queue the connection to be accepted
>> later, when qmp actually finishes with its active connection and
>> re-executes accept().
>>
>> The problem arises if the new client closes the connection before the
>> older one is finished. When qmp runs accept() again, it will get a
>> socket fd for a client who has now closed the connection. At this point,
>> the monitor control event handler will get triggered and try to write /
>> flush the greeting message to the client. The client however has closed
>> its socket so the write will fail with SIGPIPE / EPIPE. Neither the
>> qemu-char nor the monitor code seem to handle this situation.
>>
>> Btw, have you tried to reproduce it?
> Not yet, I may have some time tomorrow. How reproducible is it for you?

We can trigger it (by following the steps described in the first mail)
consistently.

> Another question: have you tried to reproduce with an old qemu version
> (say v1.0) to see if this bug always existed? If the bug was introduced
> in some recent QEMU version you could try to bisect it.

v1.1 is not affected. I checked the code and it seems the monitor code
has been refactored since v1.1.

> Maybe you could try to reproduce with a different subsystem so that we
> can rule out or confirm monitor's involvement? Like -serial?

It's actually a fault of the monitor_flush() function. As far as I can
understand, monitor_flush() calls qemu_chr_fe_write() and doesn't handle
all of the return codes / error cases properly (as I described in a
previous mail). If you check the function, you'll see that the final
case (where it set ups a watch / callback) always assumes an EAGAIN /
EWOULDBLOCK error.

If you can verify / confirm that this is the case and that the patch
sent resolves the issue in a sane / correct way, I'll resubmit it
properly (with git-format-patch, a git log msg etc).

Thanks,
Stratos


>
>> Thanks,
>> Stratos
>>
>>>>>>>>     5) Client B does *NOT* get any greating message
>>>>>>>>     6) Client B waits (select on the socket's fd)
>>>>>>>>     7) Client B closes connection (kill socat)
>>>>>>>>     8) Client A quits too
>>>>>>>>     9) Client C connects to qmp socket
>>>>>>>>     10) Client C gets *two* greeting messages!!!
>>>>>>> Hi Stratos, thank you for debugging and reporting this.
>>>>>>>
>>>>>>> I tested this sequence but can't fully reproduce this. What I see is 5) 
>>>>>>> but no
>>>>>>> 10). Client C acts normally. And your patch below doesn't solve it for 
>>>>>>> me.
>>>>>> Hm, which qemu version (or repo branch / tag) did you use? We did a
>>>>>> quick scan of the master branch code / commits, but we didn't find
>>>>>> anything that might fix the issue.
>>>>> I tried on qemu.git master, and also 1.7. I think it is a bug: in my 
>>>>> test, step
>>>>> 5), B not getting any greeting message.
>>>>>
>>>>> But I get only one greeting message in step 10), which is a bit different 
>>>>> from
>>>>> what you reported.
>>>>>
>>>>> And no difference with your patch applied.
>>>>>
>>>>> Cc'ing Luiz who maintains the monitor code and may have more ideas.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Fam
>>>>>
>>>>>>> To submit a patch, please follow instructions as described in
>>>>>>> http://wiki.qemu.org/Contribute/SubmitAPatch
>>>>>>> so it could be picked up by maintainers. Specifically, you need to 
>>>>>>> format your
>>>>>>> patch email with "git format-patch" and add a "Signed-off-by:" line in 
>>>>>>> your
>>>>>>> patch email.
>>>>>> Ok. If any dev can confirm that this is a bug (and that the patch below
>>>>>> is the correct way to fix it) I'll resubmit it properly.
>>>>>>
>>>>>> Thanks,
>>>>>> Stratos
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Fam
>>>>>>>
>>>>>>>> After some investigation, we traced it down to the monitor_flush()
>>>>>>>> function in monitor.c. Specifically, when a second client connects to
>>>>>>>> the qmp (client B), while another one is already using it (client A), 
>>>>>>>> we
>>>>>>>> get the following from stracing the second client (client B):
>>>>>>>>
>>>>>>>>     connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
>>>>>>>>     getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
>>>>>>>>     select(4, [0 3], [1 3], [], NULL)       = 2 (out [1 3])
>>>>>>>>     select(4, [0 3], [], [], NULL
>>>>>>>>
>>>>>>>> So, the connect() syscall from client B succeeds, although client B
>>>>>>>> connection has not yet been accepted by the qmp server (it's still in
>>>>>>>> the backlog of the qmp listening socket).
>>>>>>>>
>>>>>>>> After killing client B and then client A, we see the following when
>>>>>>>> stracing the qemu proc:
>>>>>>>>
>>>>>>>>     22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
>>>>>>>>     22363 fcntl(9, F_GETFL)                 = 0x2 (flags O_RDWR)
>>>>>>>>     22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>>>>>>>>     22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>>>>>>>>     22363 fcntl(9, F_GETFL)                 = 0x802 (flags
>>>>>>>>     O_RDWR|O_NONBLOCK)
>>>>>>>>     22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
>>>>>>>>     -1 EPIPE (Broken pipe)
>>>>>>>>     22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
>>>>>>>>
>>>>>>>> The qmp server / qemu accepts the connection from client B (who has now
>>>>>>>> closed the connection) and tries to write the greeting message to the
>>>>>>>> socket fd. This results in write returning an error (EPIPE).
>>>>>>>>
>>>>>>>> The monitor_flush() function doesn't seem to handle this case (write
>>>>>>>> error). Instead, it adds a watch / handler to retry the write 
>>>>>>>> operation.
>>>>>>>> Thus, mon->outbuf is not cleaned up properly, which results in 
>>>>>>>> duplicate
>>>>>>>> greeting messages for the next client to connect.
>>>>>>>>
>>>>>>>> The following seems to do the trick.
>>>>>>>>
>>>>>>>> diff --git a/monitor.c b/monitor.c
>>>>>>>> index 845f608..5622f20 100644
>>>>>>>> --- a/monitor.c
>>>>>>>> +++ b/monitor.c
>>>>>>>> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
>>>>>>>>  
>>>>>>>>      if (len && !mon->mux_out) {
>>>>>>>>          rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
>>>>>>>> -        if (rc == len) {
>>>>>>>> -            /* all flushed */
>>>>>>>> +        if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
>>>>>>>> +            /* all flushed or error */
>>>>>>>>              QDECREF(mon->outbuf);
>>>>>>>>              mon->outbuf = qstring_new();
>>>>>>>>              return;
>>>>>>>>
>>>>>>>> Comments?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Stratos
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Stratos Psomadakis
>>>>>>>> <address@hidden>
>>>>>>>>
>>>>>> -- 
>>>>>> Stratos Psomadakis
>>>>>> <address@hidden>
>>>>>>
>>>>>>
>>


-- 
Stratos Psomadakis
<address@hidden>


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]