[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation
From: |
Mark Burton |
Subject: |
Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation |
Date: |
Wed, 6 May 2015 18:20:25 +0200 |
> On 6 May 2015, at 18:19, alvise rigo <address@hidden> wrote:
>
> Hi Mark,
>
> Firstly, thank you for your feedback.
>
> On Wed, May 6, 2015 at 5:55 PM, Mark Burton <address@hidden> wrote:
>> A massive thank you for doing this work Alvise,
>>
>> On our side, the patch we suggested is only applicable for ARM, though the
>> mechanism would work for any CPU,
>> - BUT
>> It doesn’t force atomic instructions out through the slow path. This is
>> either a very good thing (it’s much faster), or a very bad thing (it doesn’t
>> allow you to treat them in the IO space), depending on your point of view.
>
> Indeed, this is for sure a more invasive approach, but it's made on
> purpose to have control over those non-atomic stores that might modify
> the 'linked' memory.
>
exactly
:-)
Cheers
Mark.
>>
>> Depending on what the rest of the community thinks, it seems to me we should
>> apply both patches so that e.g. ARM’s existing atomic instructions run much
>> faster and above all more ‘accurately’ - (with the patch we’ve provided),
>> and the same mechanism can be applied to all other architectures - but we
>> can - somehow - swap for this more ‘controllable’ implementation when e.g.
>> the mutex is located in IO space….
>
> Yes, this makes sense.
>
> Thank you,
> alvise
>
>>
>> Cheers
>>
>> Mark.
>>
>>> On 6 May 2015, at 17:38, Alvise Rigo <address@hidden> wrote:
>>>
>>> This patch series provides an infrastructure for atomic
>>> instruction implementation in QEMU, paving the way for TCG multi-threading.
>>> The adopted design does not rely on host atomic
>>> instructions and is intended to propose a 'legacy' solution for
>>> translating guest atomic instructions.
>>>
>>> The underlying idea is to provide new TCG instructions that guarantee
>>> atomicity to some memory accesses or in general a way to define memory
>>> transactions. More specifically, a new pair of TCG instructions are
>>> implemented, qemu_ldlink_i32 and qemu_stcond_i32, that behave as
>>> LoadLink and StoreConditional primitives (only 32 bit variant
>>> implemented). In order to achieve this, a new bitmap is added to the
>>> ram_list structure (always unique) which flags all memory pages that
>>> could not be accessed directly through the fast-path, due to previous
>>> exclusive operations. This new bitmap is coupled with a new TLB flag
>>> which forces the slow-path exectuion. All stores which take place
>>> between an LL/SC operation by other vCPUs in the same memory page, will
>>> fail the subsequent StoreConditional.
>>>
>>> In theory, the provided implementation of TCG LoadLink/StoreConditional
>>> can be used to properly handle atomic instructions on any architecture.
>>>
>>> The new slow-path is implemented such that:
>>> - the LoadLink behaves as a normal load slow-path, except for cleaning
>>> the dirty flag in the bitmap. The TLB entries created from now on will
>>> force the slow-path. To ensure it, we flush the TLB cache for the
>>> other vCPUs
>>> - the StoreConditional behaves as a normal store slow-path, except for
>>> checking the state of the dirty bitmap and returning 0 or 1 whether or
>>> not the StoreConditional succeeded (0 when no vCPU has touched the
>>> same memory in the mean time).
>>>
>>> All those write accesses that are forced to follow the 'legacy'
>>> slow-path will set the accessed memory page to dirty.
>>>
>>> In this series only the ARM ldrex/strex instructions are implemented.
>>> The code was tested with bare-metal test cases and with Linux, using
>>> upstream QEMU.
>>>
>>> This work has been sponsored by Huawei Technologies Dusseldorf GmbH.
>>>
>>> Alvise Rigo (5):
>>> exec: Add new exclusive bitmap to ram_list
>>> Add new TLB_EXCL flag
>>> softmmu: Add helpers for a new slow-path
>>> tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions
>>> target-arm: translate: implement qemu_ldlink and qemu_stcond ops
>>>
>>> cputlb.c | 11 ++-
>>> include/exec/cpu-all.h | 1 +
>>> include/exec/cpu-defs.h | 2 +
>>> include/exec/memory.h | 3 +-
>>> include/exec/ram_addr.h | 19 +++-
>>> softmmu_llsc_template.h | 233
>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>> softmmu_template.h | 52 ++++++++++-
>>> target-arm/translate.c | 94 ++++++++++++++++++-
>>> tcg/arm/tcg-target.c | 105 ++++++++++++++++------
>>> tcg/tcg-be-ldst.h | 2 +
>>> tcg/tcg-op.c | 20 +++++
>>> tcg/tcg-op.h | 3 +
>>> tcg/tcg-opc.h | 4 +
>>> tcg/tcg.c | 2 +
>>> tcg/tcg.h | 20 +++++
>>> 15 files changed, 538 insertions(+), 33 deletions(-)
>>> create mode 100644 softmmu_llsc_template.h
>>>
>>> --
>>> 2.4.0
>>>
>>
>>
>> +44 (0)20 7100 3485 x 210
>> +33 (0)5 33 52 01 77x 210
>>
>> +33 (0)603762104
>> mark.burton
>>
+44 (0)20 7100 3485 x 210
+33 (0)5 33 52 01 77x 210
+33 (0)603762104
mark.burton
- Re: [Qemu-devel] [RFC 4/5] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions, (continued)
- [Qemu-devel] [RFC 3/5] softmmu: Add helpers for a new slow-path, Alvise Rigo, 2015/05/06
- [Qemu-devel] [RFC 5/5] target-arm: translate: implement qemu_ldlink and qemu_stcond ops, Alvise Rigo, 2015/05/06
- Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation, Paolo Bonzini, 2015/05/06
- Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation, Mark Burton, 2015/05/06
- Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation, Alex Bennée, 2015/05/08
- Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation, Emilio G. Cota, 2015/05/08