Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.

From:	Mark Burton
Subject:	Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
Date:	Thu, 18 Dec 2014 10:12:12 +0100

> On 17 Dec 2014, at 17:39, Peter Maydell <address@hidden> wrote:
> 
> On 17 December 2014 at 16:29, Mark Burton <address@hidden> wrote:
>>> On 17 Dec 2014, at 17:27, Peter Maydell <address@hidden> wrote:
>>> I think a mutex is fine, personally -- I just don't want
>>> to see fifteen hand-hacked mutexes in the target-* code.
>>> 
>> 
>> Which would seem to favour the helper function approach?
>> Or am I missing something?
> 
> You need at least some support from QEMU core -- consider
> what happens with this patch if the ldrex takes a data
> abort, for instance.
> 
> And if you need the "stop all other CPUs while I do this”

It looks like a corner case, but working this through - the ’simple’ put a 
mutex around the atomic instructions approach would indeed need to ensure that 
no other core was doing anything - that just happens to be true for qemu today 
(or - we would have to put a mutex around all writes); in order to ensure the 
case where a store exclusive could potential fail if a non-atomic instruction 
wrote (a different value) to the same address. This is currently guarantee by 
the implementation in Qemu - how useful it is I dont know, but if we break it, 
we run the risk that something will fail (at the least, we could not claim to 
have kept things the same).

This also has implications for the idea of adding TCG ops I think...
The ideal scenario is that we could ‘fallback’ on the same semantics that are 
there today - allowing specific target/host combinations to be optimised (and 
to improve their functionality). 
But that means, from within the TCG Op, we would need to have a mechanism, to 
cause other TCG’s to take an exit…. etc etc… In the end, I’m sure it’s 
possible, but it feels so awkward.

To re-cap where we are (for my own benefit if nobody else):
We have several propositions in terms of implementing Atomic instructions

1/ We wrap the atomic instructions in a mutex using helper functions (this is 
the approach others have taken, it’s simple, but it is not clean, as stated 
above).

1.5/ We add a mechanism to ensure that when the mutex is taken, all other cores 
are ‘stopped’.

2/ We add some TCG ops to effectively do the same thing, but this would give us 
the benefit of being able to provide better implementations. This is 
attractive, but we would end up needing ops to cover at least exclusive 
load/store and atomic compare exchange. To me this looks less than elegant 
(being pulled close to the target, rather than being able to generalise), but 
it’s not clear how we would implement the operations as we would like, with a 
machine instruction, unless we did split them out along these lines. This 
approach also (probably) requires the 1.5 mechanism above.

3/ We have discussed a ‘h/w’ approach to the problem. In this case, all atomic 
instructions are forced to take the slow path - and a additional flags are 
added to the memory API. We then deal with the issue closer to the memory where 
we can record who has a lock on a memory address. For this to work - we would 
also either
a) need to add a mprotect type approach to ensure no ‘non atomic’ writes occur 
- or
b) need to force all cores to mark the page with the exclusive memory as IO or 
similar to ensure that all write accesses followed the slow path.

4/ There is an option to implement exclusive operations within the TCG using 
mprotect (and signal handlers). I have some concerns on this : would we need 
have to have support for each host O/S…. I also think we might end up the a lot 
of protected regions causing a lot of SIGSEGV’s because an errant guest doesn’t 
behave well - basically we will need to see the impact on performance - finally 
- this will be really painful to deal with for cases where the exclusive memory 
is held in what Qemu considers IO space !!!
        In other words - putting the mprotect inside TCG looks to me like it’s 
mutually exclusive to supporting a memory-based scheme like (3).

My personal preference is for 3b) it  is “safe” - its where the hardware is.
3a is an optimization of that.
to me, (2) is an optimisation again. We are effectively saying, if you are able 
to do this directly, then you dont need to pass via the slow path. Otherwise, 
you always have the option of reverting to the slow path.

Frankly - 1 and 1.5 are hacks - they are not optimisations, they are just dirty 
hacks. However - their saving grace is that they are hacks that exist and 
“work”. I dislike patching the hack, but it did seem to offer the fastest 
solution to get around this problem - at least for now. I am no longer 
convinced.

4/ is something I’d like other peoples views on too… Is it a better approach? 
What about the slow path?

I increasingly begin to feel that we should really approach this from the other 
end, and provide a ‘correct’ solution using the memory - then worry about 
making that faster…

Cheers

Mark.

> semantics linux-user currently uses then that definitely needs
> core code support. (Maybe linux-user is being over-zealous
> there; I haven't thought about it.)
> 
> -- PMM

         +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

        +33 (0)603762104
        mark.burton

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., (continued)

Prev by Date: [Qemu-devel] [PATCH 2/3 V2] s390: implement pci instructions
Next by Date: [Qemu-devel] [PATCH 0/2] ignore bus master for e1000
Previous by thread: Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
Next by thread: Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
Index(es):
- Date
- Thread