qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000


From: Jan Kiszka
Subject: Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
Date: Thu, 25 Oct 2012 20:48:25 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2012-10-25 19:02, Avi Kivity wrote:
> On 10/25/2012 06:39 PM, Jan Kiszka wrote:
>>>
>>> That doesn't work cross-thread.
>>>
>>> vcpu A: write to device X, dma-ing to device Y
>>> vcpu B: write to device Y, dma-ing to device X
>>
>> We will deny DMA-ing from device X on behalf of a VCPU, ie. in dispatch
>> context, to Y.
>>
>> What we do not deny, though, is DMA-ing from an I/O thread that
>> processes an event for device X. 
> 
> I would really like to avoid depending on the context.  In real hardware, 
> there is no such thing.

The point is how we deal with any kind of access to a device that
requires taking that device's lock while holding another lock, provided
that scenario can also take place in reverse order at the same time.
Known scenarios are:

 - vcpu 1 -> access device A -> access device B
 - vcpu 2 -> access device B -> access device A

 - event 1 -> device A event processing -> access device B
 - event 2 -> device B event processing -> access device A

and combinations of those pairs.

> 
>> If the invoked callback of device X
>> holds the device lock across some DMA request to Y, then we risk to run
>> into the same ABBA issue. Hmm...
> 
> Yup.
> 
>>
>>>
>>> My suggestion was to drop the locks around DMA, then re-acquire the lock
>>> and re-validate data.
>>
>> Maybe possible, but hairy depending on the device model.
> 
> It's unpleasant, yes.
> 
> Note depending on the device, we may not need to re-validate data, it may be 
> sufficient to load it into local variables to we know it is consistent at 
> some point.  But all those solutions suffer from requiring device model 
> authors to understand all those issues, rather than just add a simple lock 
> around access to their data structures.

Right. And therefor it is a suboptimal way to start (patching).

> 
>>>>>> I see that we have a all-or-nothing problem here: to address this
>>>>>> properly, we need to convert the IRQ path to lock-less (or at least
>>>>>> compatible with holding per-device locks) as well.
>>>>>
>>>>> There is a transitional path where writing to a register that can cause
>>>>> IRQ changes takes both the big lock and the local lock.
>>>>>
>>>>> Eventually, though, of course all inner subsystems must be threaded for
>>>>> this work to have value.
>>>>>
>>>>
>>>> But that transitional path must not introduce regressions. Opening a
>>>> race window between IRQ cause update and event injection is such a
>>>> thing, just like dropping concurrent requests on the floor.
>>>
>>> Can you explain the race?
>>
>> Context A                            Context B
>>
>> device.lock
>> ...
>> device.set interrupt_cause = 0
>> lower_irq = true
>> ...
>> device.unlock
>>                                      device.lock
>>                                      ...
>>                                      device.interrupt_cause = 42
>>                                      raise_irq = true
>>                                      ...
>>                                      device.unlock
>>                                      if (raise_irq)
>>                                              bql.lock
>>                                              set_irq(device.irqno)
>>                                              bql.unlock
>> if (lower_irq)
>>      bql.lock
>>      clear_irq(device.irqno)
>>      bql.unlock
>>
>>
>> And there it goes, our interrupt event.
> 
> Obviously you'll need to reacquire the device lock after taking bql and 
> revalidate stuff.  But that is not what I am suggesting.  Instead, any path 
> that can lead to an irq update (or timer update etc) will take both the bql 
> and the device lock.  This will leave after the first pass only side effect 
> free register reads and writes, which is silly if we keep it that way, but we 
> will follow with a threaded timer and irq subsystem and we'll peel away those 
> big locks.
> 
>   device_mmio_write:
>     if register is involved in irq or timers or block layer or really 
> anything that matters:
>       bql.acquire
>     device.lock.acquire
>     do stuff
>     device.lock.release
>     if that big condition from above was true:
>       bql.release

Looks simpler than it is as you cannot wrap complete handlers with that
pattern. An example where it would fail (until we solved the locking
issues above):

mmio_write:
  bql.conditional_lock
  device.lock
  device.check_state
  issue_dma
  device.update_state
  update_irq, play_with_timers, etc.
  device.unlock
  bql.conditional_unlock

If that DMA request hits an unconverted MMIO region or one that takes
BQL conditionally as above, we will lock up (or bail out as our mutexes
detect the error). E1000's start_xmit looks like this so far, and that's
a pretty import service.

Moreover, I prefer having a representative cut-through over enjoying to
merge a first step that excludes some 80% of the problems. For that
reason I would be even be inclined to start with addressing the IRQ
injection topic first (patch-wise), then the other necessary backend
services for the e1000 or whatever and convert some device(s) last.

IOW: cut out anything from this series that touches e1000 until the
building blocks for converting it reasonably are finished. Carrying
experimental, partially broken conversion on top is fine, try to merge
pieces of that not, IMHO.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



reply via email to

[Prev in Thread] Current Thread [Next in Thread]