qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
Date: Tue, 25 Jan 2011 09:49:04 +0000

On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <address@hidden> wrote:
> On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <address@hidden> wrote:
>> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>>>> Virtqueue notify is currently handled synchronously in userspace 
>>>>>>> virtio.  This
>>>>>>> prevents the vcpu from executing guest code while hardware emulation 
>>>>>>> code
>>>>>>> handles the notify.
>>>>>>>
>>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
>>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to 
>>>>>>> the
>>>>>>> iothread and allowing the VM to continue execution.  This model is 
>>>>>>> similar to
>>>>>>> how vhost receives virtqueue notifies.
>>>>>>>
>>>>>>> The result of this change is improved performance for userspace virtio 
>>>>>>> devices.
>>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios 
>>>>>>> and
>>>>>>> virtio-net transmit throughput increases substantially.
>>>>>>>
>>>>>>> Some virtio devices are known to have guest drivers which expect a 
>>>>>>> notify to be
>>>>>>> processed synchronously and spin waiting for completion.  Only enable 
>>>>>>> ioeventfd
>>>>>>> for virtio-blk and virtio-net for now.
>>>>>>>
>>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>>>>>>> host notifiers as it wishes.
>>>>>>>
>>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
>>>>>>> will enable/disable itself.
>>>>>>>
>>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
>>>>>>>
>>>>>>> Signed-off-by: Stefan Hajnoczi <address@hidden>
>>>>>>
>>>>>> On current git master I'm getting hangs when running iozone on a
>>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
>>>>>> 100% CPU consumption.
>>>>>>
>>>>>> I bisected the problem to this patch. Any ideas?
>>>>>>
>>>>>> Kevin
>>>>>
>>>>> Does it help if you set ioeventfd=off on command line?
>>>>
>>>> Yes, with ioeventfd=off it seems to work fine.
>>>>
>>>> Kevin
>>>
>>> Then it's the ioeventfd that is to blame.
>>> Is it the io thread that consumes 100% CPU?
>>> Or the vcpu thread?
>>
>> I was building with the default options, i.e. there is no IO thread.
>>
>> Now I'm just running the test with IO threads enabled, and so far
>> everything looks good. So I can only reproduce the problem with IO
>> threads disabled.
>
> Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
> (relevant when --enable-io-thread is not used).  I will take a look at
> that again and see why we're spinning without checking for ioeventfd
> completion.

Here's my understanding of --disable-io-thread.  Added Anthony on CC,
please correct me.

When I/O thread is disabled our only thread runs guest code until an
exit request is made.  There are synchronous exit cases like a halt
instruction or single step.  There are also asynchronous exit cases
when signal handlers use qemu_notify_event(), which does cpu_exit(),
to set env->exit_request = 1 and unlink the current tb.

With this structure in mind, anything which needs to interrupt the
vcpu in order to process events must use signals and
qemu_notify_event().  Otherwise that event source may be starved and
never processed.

virtio-ioeventfd currently does not use signals and will therefore
never interrupt the vcpu.

However, you normally don't notice the missing signal handler because
some other event interrupts the vcpu and we enter select(2) to process
all pending handlers.  So virtio-ioeventfd mostly gets a free ride on
top of timer events.  This is suboptimal because it adds latency to
virtqueue kick - we're waiting for another event to interrupt the vcpu
before we can process virtqueue-kick.

If any other vcpu interruption makes virtio-ioeventfd chug along then
why are you seeing 100% CPU livelock?  My theory is that dynticks has
a race condition which causes timers to stop working in QEMU.  Here is
an strace of QEMU --disable-io-thread entering live lock.  I can
trigger this by starting a VM and running "while true; do true; done"
at the shell.  Then strace the QEMU process:

08:04:34.985177 ioctl(11, KVM_RUN, 0)   = 0
08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
273000}}, NULL) = 0
08:04:34.985646 ioctl(11, KVM_RUN, 0)   = -1 EINTR (Interrupted system call)
08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
250000}}, NULL) = 0
08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
08:04:34.986406 ioctl(11, KVM_RUN, 0)   = 0
08:04:34.986465 ioctl(11, KVM_RUN, 0)   = 0              <--- guest
finishes execution

                v--- dynticks_rearm_timer() returns early because
timer is already scheduled
08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = 0
08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) ---    <--- timer expires
08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
08:04:34.986710 rt_sigreturn(0x2758ad0) = 0

                v--- we re-enter the guest without rearming the timer!
08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
[QEMU hang, 100% CPU]

So dynticks fails to rearm the timer before we enter the guest.  This
is a race condition: we check that there is already a timer scheduled
and head on towards re-entering the guest, the timer expires before we
enter the guest, we re-enter the guest without realizing the timer has
expired.  Now we're inside the guest without the hope of a timer
expiring - and the guest is running a CPU-bound workload that doesn't
need to perform I/O.

The result is a hung QEMU (screen does not update) and a softlockup
inside the guest once we do kick it to life again (by detaching
strace).

I think the only way to avoid this race condition in dynticks is to
mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
with atomic signal mask change back to SIGALRM enabled.  Thoughts?

Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
when there is no I/O thread.  It doesn't make sense because there's no
opportunity to process the virtqueue while the guest code is executing
in parallel like there is with I/O thread.  It will just degrade
performance when QEMU only has one thread.  I'll send a patch to
disable it when we build without I/O thread.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]