qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.


From: Alexander Graf
Subject: Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
Date: Thu, 18 Dec 2014 13:24:56 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.3.0


On 18.12.14 10:12, Mark Burton wrote:
> 
>> On 17 Dec 2014, at 17:39, Peter Maydell <address@hidden> wrote:
>>
>> On 17 December 2014 at 16:29, Mark Burton <address@hidden> wrote:
>>>> On 17 Dec 2014, at 17:27, Peter Maydell <address@hidden> wrote:
>>>> I think a mutex is fine, personally -- I just don't want
>>>> to see fifteen hand-hacked mutexes in the target-* code.
>>>>
>>>
>>> Which would seem to favour the helper function approach?
>>> Or am I missing something?
>>
>> You need at least some support from QEMU core -- consider
>> what happens with this patch if the ldrex takes a data
>> abort, for instance.
>>
>> And if you need the "stop all other CPUs while I do this”
> 
> It looks like a corner case, but working this through - the ’simple’ put a 
> mutex around the atomic instructions approach would indeed need to ensure 
> that no other core was doing anything - that just happens to be true for qemu 
> today (or - we would have to put a mutex around all writes); in order to 
> ensure the case where a store exclusive could potential fail if a non-atomic 
> instruction wrote (a different value) to the same address. This is currently 
> guarantee by the implementation in Qemu - how useful it is I dont know, but 
> if we break it, we run the risk that something will fail (at the least, we 
> could not claim to have kept things the same).
> 
> This also has implications for the idea of adding TCG ops I think...
> The ideal scenario is that we could ‘fallback’ on the same semantics that are 
> there today - allowing specific target/host combinations to be optimised (and 
> to improve their functionality). 
> But that means, from within the TCG Op, we would need to have a mechanism, to 
> cause other TCG’s to take an exit…. etc etc… In the end, I’m sure it’s 
> possible, but it feels so awkward.

That's the nice thing about transactions - they guarantee that no other
CPU accesses the same cache line at the same time. So you're safe
against other vcpus even without blocking them manually.

For the non-transactional implementation we probably would need an "IPI
others and halt them until we're done with the critical section"
approach. But I really wouldn't concentrate on making things fast on old
CPUs.

Also keep in mind that for the UP case we can always omit all the magic
- we only need to detect when we move into an SMP case (linux-user clone
or -smp on system).

> 
> To re-cap where we are (for my own benefit if nobody else):
> We have several propositions in terms of implementing Atomic instructions
> 
> 1/ We wrap the atomic instructions in a mutex using helper functions (this is 
> the approach others have taken, it’s simple, but it is not clean, as stated 
> above).

This is horrible. Imagine you have this split approach with a load
exclusive and then store whereas the load starts mutex usage and the
store stop is. At that point if the store creates a segfault you'll be
left with a dangling mutex.

This stuff really belongs into the TCG core.

> 
> 1.5/ We add a mechanism to ensure that when the mutex is taken, all other 
> cores are ‘stopped’.
> 
> 2/ We add some TCG ops to effectively do the same thing, but this would give 
> us the benefit of being able to provide better implementations. This is 
> attractive, but we would end up needing ops to cover at least exclusive 
> load/store and atomic compare exchange. To me this looks less than elegant 
> (being pulled close to the target, rather than being able to generalise), but 
> it’s not clear how we would implement the operations as we would like, with a 
> machine instruction, unless we did split them out along these lines. This 
> approach also (probably) requires the 1.5 mechanism above.

I'm still in favor of just forcing the semantics of transactions onto
this. If the host doesn't implement transactions, tough luck - do the
"halt all others" IPI.

> 
> 3/ We have discussed a ‘h/w’ approach to the problem. In this case, all 
> atomic instructions are forced to take the slow path - and a additional flags 
> are added to the memory API. We then deal with the issue closer to the memory 
> where we can record who has a lock on a memory address. For this to work - we 
> would also either
> a) need to add a mprotect type approach to ensure no ‘non atomic’ writes 
> occur - or
> b) need to force all cores to mark the page with the exclusive memory as IO 
> or similar to ensure that all write accesses followed the slow path.
> 
> 4/ There is an option to implement exclusive operations within the TCG using 
> mprotect (and signal handlers). I have some concerns on this : would we need 
> have to have support for each host O/S…. I also think we might end up the a 
> lot of protected regions causing a lot of SIGSEGV’s because an errant guest 
> doesn’t behave well - basically we will need to see the impact on performance 
> - finally - this will be really painful to deal with for cases where the 
> exclusive memory is held in what Qemu considers IO space !!!
>       In other words - putting the mprotect inside TCG looks to me like it’s 
> mutually exclusive to supporting a memory-based scheme like (3).

Again, I don't think it's worth caring about legacy host systems too
much. In a few years from now transactional memory will be commodity,
just like KVM is today.


Alex

> My personal preference is for 3b) it  is “safe” - its where the hardware is.
> 3a is an optimization of that.
> to me, (2) is an optimisation again. We are effectively saying, if you are 
> able to do this directly, then you dont need to pass via the slow path. 
> Otherwise, you always have the option of reverting to the slow path.
> 
> Frankly - 1 and 1.5 are hacks - they are not optimisations, they are just 
> dirty hacks. However - their saving grace is that they are hacks that exist 
> and “work”. I dislike patching the hack, but it did seem to offer the fastest 
> solution to get around this problem - at least for now. I am no longer 
> convinced.
> 
> 4/ is something I’d like other peoples views on too… Is it a better approach? 
> What about the slow path?
> 
> I increasingly begin to feel that we should really approach this from the 
> other end, and provide a ‘correct’ solution using the memory - then worry 
> about making that faster…
> 
> Cheers
> 
> Mark.
> 
> 
> 
> 
> 
> 
> 
> 
>> semantics linux-user currently uses then that definitely needs
>> core code support. (Maybe linux-user is being over-zealous
>> there; I haven't thought about it.)
>>
>> -- PMM
> 
> 
>        +44 (0)20 7100 3485 x 210
>  +33 (0)5 33 52 01 77x 210
> 
>       +33 (0)603762104
>       mark.burton
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]