qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv2 1/3] virtio: add missing mb() on notification


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCHv2 1/3] virtio: add missing mb() on notification
Date: Tue, 24 Apr 2012 16:22:37 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1

Il 24/04/2012 16:20, Michael S. Tsirkin ha scritto:
> On Tue, Apr 24, 2012 at 03:46:25PM +0200, Paolo Bonzini wrote:
>> Il 23/04/2012 15:19, Michael S. Tsirkin ha scritto:
>>> During normal operation, virtio first writes a used index
>>> and then checks whether it should interrupt the guest
>>> by reading guest avail index/flag values.
>>>
>>> Guest does the reverse: writes the index/flag,
>>> then checks the used ring.
>>>
>>> The ordering is important: if host avail flag read bypasses the used
>>> index write, we could in effect get this timing:
>>>
>>> host avail flag read
>>>             guest enable interrupts: avail flag write
>>>             guest check used ring: ring is empty
>>> host used index write
>>>
>>> which results in a lost interrupt: guest will never be notified
>>> about the used ring update.
>>>
>>> This actually can happen when using kvm with an io thread,
>>> such that the guest vcpu and qemu run on different host cpus,
>>> and this has actually been observed in the field
>>> (but only seems to trigger on very specific processor types)
>>> with userspace virtio: vhost has the necessary smp_mb()
>>> in place to prevent the regordering, so the same workload stalls
>>> forever waiting for an interrupt with vhost=off but works
>>> fine with vhost=on.
>>>
>>> Insert an smp_mb barrier operation in userspace virtio to
>>> ensure the correct ordering.
>>> Applying this patch fixed the race condition we have observed.
>>> Tested on x86_64. I checked the code generated by the new macro
>>> for i386 and ppc but didn't run virtio.
>>>
>>> Signed-off-by: Michael S. Tsirkin <address@hidden>
>>> ---
>>>  hw/virtio.c    |    2 ++
>>>  qemu-barrier.h |   23 ++++++++++++++++++++---
>>>  2 files changed, 22 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/virtio.c b/hw/virtio.c
>>> index f805790..6449746 100644
>>> --- a/hw/virtio.c
>>> +++ b/hw/virtio.c
>>> @@ -693,6 +693,8 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue 
>>> *vq)
>>>  {
>>>      uint16_t old, new;
>>>      bool v;
>>> +    /* We need to expose used array entries before checking used event. */
>>> +    mb();
>>
>> mb() vs. smp_mb()?
> 
> rhel used wmb() everywhere so this keeps it consistent.
> upstream switched to smp_wmb so I added smp_mb there.
> 
>>>      /* Always notify when queue is empty (when feature acknowledge) */
>>>      if (((vdev->guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) &&
>>>           !vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx)) {
>>> diff --git a/qemu-barrier.h b/qemu-barrier.h
>>> index c11bb2b..f6722a8 100644
>>> --- a/qemu-barrier.h
>>> +++ b/qemu-barrier.h
>>> @@ -4,7 +4,7 @@
>>>  /* Compiler barrier */
>>>  #define barrier()   asm volatile("" ::: "memory")
>>>  
>>> -#if defined(__i386__) || defined(__x86_64__)
>>> +#if defined(__i386__)
>>>  
>>>  /*
>>>   * Because of the strongly ordered x86 storage model, wmb() is a nop
>>> @@ -13,15 +13,31 @@
>>>   * load/stores from C code.
>>>   */
>>>  #define smp_wmb()   barrier()
>>> +/*
>>> + * We use GCC builtin if it's available, as that can use
>>> + * mfence on 32 bit as well, e.g. if built with -march=pentium-m.
>>> + * However, on i386, there seem to be known bugs as recently as 4.3.
>>> + * */
>>
>> Do you know what those bugs are?  Either add a pointer, or there is no
>> reason to have cruft that is only backed by hearsay.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
> I'll add this link in the commit log.

Ok, thanks.  I modified my "atomics" branch to add it too.

>>> +#if defined(_GNUC__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 4
>>> +#define smp_mb() __sync_synchronize()
>>> +#else
>>> +#define smp_mb() asm volatile("lock; addl $0,0(%%esp) " ::: "memory")
>>> +#endif
>>> +
>>> +#elif defined(__x86_64__)
>>> +
>>> +#define smp_wmb()   barrier()
>>> +#define smp_mb() asm volatile("mfence" ::: "memory")
>>>  
>>>  #elif defined(_ARCH_PPC)
>>>  
>>>  /*
>>> - * We use an eieio() for a wmb() on powerpc.  This assumes we don't
>>> + * We use an eieio() for wmb() and mb() on powerpc.  This assumes we don't
>>>   * need to order cacheable and non-cacheable stores with respect to
>>>   * each other
>>>   */
>>>  #define smp_wmb()   asm volatile("eieio" ::: "memory")
>>> +#define smp_mb()   asm volatile("eieio" ::: "memory")
>>
>> smp_mb() is hwsync under PPC,
> 
> This one?
> __asm__ __volatile__ ("sync" : : : "memory")
> 
>> but I would just trust GCC.
>>
>> Paolo
> 
> __sync_synchronize? Unfortunately it's still pretty new.

So is KVM on PPC. :)

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]