[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 1/1] qemu-char: socket backend: disconnect on
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH v2 1/1] qemu-char: socket backend: disconnect on write error |
Date: |
Fri, 24 Feb 2017 14:42:24 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) |
Marc-André Lureau <address@hidden> writes:
> Hi
>
> On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <address@hidden> wrote:
>
>> From: Anton Nefedov <address@hidden>
>>
>> Socket backend read handler should normally perform a disconnect, however
>> the read handler may not get a chance to run if the frontend is not ready
>> (qemu_chr_be_can_write() == 0).
>>
>> This means that in virtio-serial frontend case if
>> - the host has disconnected (giving EPIPE on socket write)
>> - and the guest has disconnected (-> frontend not ready -> backend
>> will not read)
>> - and there is still data (frontend->backend) to flush (has to be a really
>> tricky timing but nevertheless, we have observed the case in production)
>>
>> This results in virtio-serial trying to flush this data continiously
>> forming
>> a busy loop.
>>
>> Solution: react on write error in the socket write handler.
>> errno is not reliable after qio_channel_writev_full(), so we may not get
>> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which
>> io_channel_send_full() converts to errno EAGAIN.
>> We must not disconnect right away though, there still may be data to read
>> (see 4bf1cb0).
>>
>> Signed-off-by: Anton Nefedov <address@hidden>
>> Signed-off-by: Denis V. Lunev <address@hidden>
>> CC: Paolo Bonzini <address@hidden>
>> CC: Daniel P. Berrange <address@hidden>
>> CC: Marc-André Lureau <address@hidden>
>> ---
>> Changes from v1:
>> - we do not rely on EPIPE anynore. Socket should be closed on all errors
>> except EAGAIN
>>
>> chardev/char-socket.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
>> index 4068dc5..e2137aa 100644
>> --- a/chardev/char-socket.c
>> +++ b/chardev/char-socket.c
>> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
>> GIOCondition cond,
>> void *opaque);
>>
>> +static int tcp_chr_read_poll(void *opaque);
>> +static void tcp_chr_disconnect(CharDriverState *chr);
>> +
>> /* Called with chr_write_lock held. */
>> static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>> {
>> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t
>> *buf, int len)
>> s->write_msgfds_num = 0;
>> }
>>
>> + if (ret < 0 && errno != EAGAIN) {
>> + if (tcp_chr_read_poll(chr) <= 0) {
>> + tcp_chr_disconnect(chr);
>> + return len;
>>
>
> This change breaks a number of assumption in vhost-user code. Until now, a
> vhost-user function assumed that dev->vhost_ops would remain as long as the
> function is running, so it may call vhost_ops callbacks several time, which
> may eventually fail to do io, but no crash would occur. The disconnection
> would be handled later with the HUP handler. Now, vhost_ops may be cleared
> during a write (chr_disconnect -> CHR_EVENT_CLOSED in
> net_vhost_user_event). This can be randomly reproduced with vhost-user-test
> -p /x86_64/vhost-user/flags-mismatch/subprocess (broken pipe, qemu crashes)
It hangs for me with annoying frequency. I'm glad you found the culprit!
> I am trying to fix this, but it isn't simple (races, many code paths etc),
> so I just wanted to inform you about it.
Can we revert this patch until we have a fix?