qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update v


From: Avi Kivity
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
Date: Thu, 28 Nov 2013 13:30:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0

On 11/28/2013 01:22 PM, Gleb Natapov wrote:
On Thu, Nov 28, 2013 at 01:18:54PM +0200, Avi Kivity wrote:
On 11/28/2013 01:02 PM, Gleb Natapov wrote:
On Thu, Nov 28, 2013 at 12:12:55PM +0200, Avi Kivity wrote:
On 11/28/2013 12:11 PM, Gleb Natapov wrote:
On Thu, Nov 28, 2013 at 11:49:00AM +0200, Avi Kivity wrote:
On 11/28/2013 11:19 AM, Gleb Natapov wrote:
On Thu, Nov 28, 2013 at 09:55:42AM +0100, Paolo Bonzini wrote:
Il 28/11/2013 07:27, Zhanghaoyu (A) ha scritto:
Without synchronize_rcu you could have

    VCPU writes to routing table
                                       e = entry from IRQ routing table
    kvm_irq_routing_update(kvm, new);
    VCPU resumes execution
                                       kvm_set_msi_irq(e, &irq);
                                       kvm_irq_delivery_to_apic_fast();

where the entry is stale but the VCPU has already resumed execution.

If we use call_rcu()(Not consider the problem that Gleb pointed out 
temporarily) instead of synchronize_rcu(), should we still ensure this?
The problem is that we should ensure this, so using call_rcu is not
possible (even not considering the memory allocation problem).

Not changing current behaviour is certainly safer, but I am still not 100%
convinced we have to ensure this.

Suppose guest does:

1: change msi interrupt by writing to pci register
2: read the pci register to flush the write
3: zero idt

I am pretty certain that this code can get interrupt after step 2 on real HW,
but I cannot tell if guest can rely on it to be delivered exactly after
read instruction or it can be delayed by couple of instructions. Seems to me
it would be fragile for an OS to depend on this behaviour. AFAIK Linux does not.

Linux is safe, it does interrupt migration from within the interrupt
handler.  If you do that before the device-specific EOI, you won't
get another interrupt until programming the MSI is complete.

Is virtio safe? IIRC it can post multiple interrupts without guest acks.

Using call_rcu() is a better solution than srcu IMO.  Less code
changes, consistently faster.
Why not fix userspace to use KVM_SIGNAL_MSI instead?


Shouldn't it work with old userspace too? Maybe I misunderstood your intent.
Zhanghaoyu said that the problem mostly hurts in real-time telecom
environment, so I propose how he can fix the problem in his specific
environment.  It will not fix older userspace obviously, but kernel
fix will also require kernel update and usually updating userspace
is easier.


Isn't the latency due to interrupt migration causing long
synchronize_rcu()s?  How does KVM_SIGNAL_MSI help?

If MSI is delivered using KVM_SIGNAL_MSI as opposite to via an entry in
irq routing table changing MSI configuration should not cause update to
irq routing table (not saying this is what happens with current QEMU, but
theoretically there is not reason to update routing table in this case).

I see. That pushes the problem to userspace, which uses traditional locking, so the problem disappears until qemu starts using rcu too to manage this.

There is also irqfd, however. We could also do a KVM_UPDATE_IRQFD to change the payload it delivers, but that has exactly the same problems.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]