qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mut


From: Jan Kiszka
Subject: Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
Date: Sat, 23 Jun 2012 11:10:18 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2012-06-23 00:56, Anthony Liguori wrote:
> On 06/22/2012 05:27 PM, Jan Kiszka wrote:
>> On 2012-06-22 23:44, Anthony Liguori wrote:
>>> 1) unlock iothread before entering the do {} look in kvm_cpu_exec()
>>>     a) reacquire the lock after the loop
>>>     b) reacquire the lock in kvm_handle_io()
>>>     c) introduce an unlocked memory accessor that for now, just requires
>>> the iothread lock() and calls cpu_physical_memory_rw()
>>
>> Right, that's what we have here as well. The latter is modeled as a so
>> called "I/O pathway", a thread-based execution context for
>> frontend/backend pairs with some logic to transfer certain I/O requests
>> asynchronously to the pathway thread.
> 
> Interesting, so the VCPU threads always hold the iothread mutex but some
> requests are routed to other threads?

The VCPUs only acquire the iothread locks _unless_ the request can be
handled directly or forwarded to a pathway thread. In the latter case,
pathway-specific locks will be taken. One big advantage of this model is
that you do not need to worry about locks in the device models
themselves. That helps migrating existing models but should also be
sufficient for quite a few use cases.

> 
> I hadn't considered a design like that.  I've been thinking about a long
> term architecture that's a bit more invasive.
> 
> What we think of the I/O thread today wouldn't be special.  It would be
> one of N I/O threads all running separate copies of the main loop.  All
> of the functions that defer dispatch to a main loop would take a context
> as an argument and devices would essentially have a "vector" array of
> main loops as input.
> 
> So virtio-net probably would have two main loop "vectors" since it would
> like to schedule tx and rx independently.  There's nothing that says
> that you can't pass the same main loop context for each vector but
> that's a configuration choice.
> 
> Dispatch from VCPU context would behave the same it does today but
> obviously per-device locking is needed.

And every backend would run over its own thread - I guess this is
conceptually close to what we have. However, the devil is in the detail.
E.g., we will also need per-iothread timer services (we skipped this so
far). And the device-to-device request problem needs to be solved (see
below).

> 
>> The tricky part was to get nested requests right, i.e. when a requests
>> triggers another one from within the device model. This is where things
>> get ugly. In theory, you can end up with a vm deadlock if you just apply
>> per-device locking. I'm currently trying to rebase our patches, review
>> and document the logic behind it.
> 
> I really think the only way to solve this is to separate map()'d DMA
> access (where the device really wants to deal with RAM only) and
> copy-based access (where devices map DMA to other devices).
> 
> For copy-based access, we really ought to move to a callback based API. 
> It adds quite a bit of complexity but it's really the only way to solve
> the problem robustly.

Maybe we are talking about the same thing: What we need is a mechanism
to queue MMIO requests for execution over some iothread / pathway
context in case we are about to get trapped by lock recursion. Then we
also have to make sure that queued requests are not overtaken by
requests issued afterward. This is an important part of our approach.

> 
>>> 2) focus initially on killing the lock in kvm_handle_io()
>>>     a) the ioport table is pretty simplistic so adding fine grain
>>> locking
>>> won't be hard.
>>>     b) reacquire lock right before ioport dispatch
>>>
>>> 3) allow for register ioport handlers w/o the dispatch function carrying
>>> a iothread
>>>     a) this is mostly memory API plumbing
>>
>> We skipped this as our NICs didn't do PIO, but you clearly need it for
>> virtio.
> 
> Right.
> 
>>> 4) focus on going back and adding fine grain locking to the
>>> cpu_physical_memory_rw() accessor
>>
>> In the end, PIO and MMIO should use the same patterns - and will face
>> the same challenges. Ideally, we model things very similar right from
>> the start.
> 
> Yes.
> 
>> And then there is also
>>
>> 5) provide direct IRQ delivery from the device model to the IRQ chip.
>> That's much like what we need for VFIO and KVM device assignment. But
>> here we won't be able to cheat and ignore correct generation of vmstates
>> of the bypassed PCI host bridges etc... Which leads me to that other
>> thread about how to handle this for PCI device pass-through.
>> Contributions to that discussion are welcome as well.
> 
> I think you mean to the in-kernel IRQ chip.  I'm thinking about this
> still so I don't have a plan yet that I'm ready to share.  I have some
> ideas though.
> 
>>
>>>
>>> Note that whenever possible, we should be using rwlocks instead of a
>>> normal mutex.  In particular, for the ioport data structures, a rwlock
>>> seems pretty obvious.
>>
>> I think we should mostly be fine with a "big hammer" rwlock: unlocked
>> read access from VCPUs and iothreads, and vmstop/resume around
>> modifications of fast path data structures (like the memory region
>> hierarchy or the PIO table).
> 
> Ack.
> 
>> Where that's not sufficient, RCU will be
>> needed. Sleeping rwlocks have horrible semantics (specifically when
>> thread priorities come into play) and are performance-wise inferior. We
>> should avoid them completely.
> 
> Yes, I think RCU is inevitable here but I think starting with rwlocks
> will help with the big refactoring.

Let's wait for the first patches... :)

> 
>>>
>>> To be clear, I'm not advocating introducing cpu_lock.  We should do
>>> whatever makes the most sense to not have to hold iothread lock while
>>> processing an exit from KVM.
>>
>> Good that we agree. :)
>>
>>>
>>> Note that this is an RFC, the purpose of this series is to have this
>>> discussion :-)
>>
>> Yep, I think we have it now ;). Hope I can contribute some code bits to
>> it soon, though I didn't schedule this task for the next week.
> 
> Great!  If you have something you can share, I'd be eager to look at it
> regardless of the condition of the code.

Let me just finish the rebasing. The completed switch to memory region
abstractions makes the code cleaner in some important parts.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]