[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_sta
From: |
Alexander Graf |
Subject: |
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue |
Date: |
Fri, 4 Jan 2013 09:40:39 +0100 |
On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
>
>
>> -----Original Message-----
>> From: Marcelo Tosatti [mailto:address@hidden
>> Sent: Friday, January 04, 2013 7:08 AM
>> To: Alexander Graf
>> Cc: address@hidden; Christian Borntraeger; Anthony Liguori; qemu-
>> address@hidden qemu-devel; Bhushan Bharat-R65777
>> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
>> do_kvm_cpu_synchronize_state data integrity issue
>>
>> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
>>>
>>> On 03.01.2013, at 19:48, Jason J. Herne wrote:
>>>
>>>> On 01/03/2013 08:56 AM, Alexander Graf wrote:
>>>>>> static void do_kvm_cpu_synchronize_state(void *_args)
>>>>>>> {
>>>>>>> struct kvm_cpu_syncstate_args *args = _args;
>>>>>>> + CPUArchState *env = args->env;
>>>>>>> + int register_level = args->register_level;
>>>>>>>
>>>>> This probably becomes more readable if we explicitly revert back to
>> unsynced state first:
>>>>>
>>>>> /* Write back local modifications at our current level */ if
>>>>> (register_level > env->kvm_vcpu_dirty) {
>>>>> kvm_arch_put_registers(...);
>>>>> env->kvm_vcpu_dirty = 0;
>>>>> }
>>>>>
>>>>> and then do the sync we are requested to do:
>>>>>
>>>>> if (!env->kvm_vcpu_dirty) {
>>>>> ...
>>>>> }
>>>>
>>>> I agree, but only if we add a second conditional to the if 1st statement
>>>> as
>> such:
>>>>
>>>> if (args->env->kvm_vcpu_dirty && register_level >
>>>> env->kvm_vcpu_dirty)
>>>>
>>>> This is to cover the case where the caller is asking for register level "1"
>> and we're already dirty at level "2". In this case, nothing should happen and
>> we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the case.
>>>
>>> As before, I'd prefer to make this explicit:
>>>
>>>>
>>>> static void do_kvm_cpu_synchronize_state(void *_args) {
>>>> struct kvm_cpu_syncstate_args *args = _args;
>>>> CPUArchState *env = args->env;
>>>> int register_level = args->register_level;
>>>
>>> if (register_level > env->kvm_vcpu_dirty) {
>>> /* We are more dirty than we need to - all is well */
>>> return;
>>> }
>>>
>>>>
>>>> /* Write back local modifications at our current level */
>>>> if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) {
>>>> kvm_arch_put_registers(env, env->kvm_vcpu_dirty);
>>>> env->kvm_vcpu_dirty = 0;
>>>> }
>>>>
>>>> if (!args->env->kvm_vcpu_dirty) {
>>>> kvm_arch_get_registers(env, register_level);
>>>> env->kvm_vcpu_dirty = register_level;
>>>> }
>>>> }
>>>>
>>>> Do you agree? Thanks for your time. :)
>>>
>>> Please also check out the discussions I've had with Bharat about his
>>> watchdog
>> patches. There we need a mechanism to synchronize registers only when we
>> actually need to, in order to avoid potential race conditions with a kernel
>> timer.
>>>
>>> That particular case doesn't work well with levels. We can have multiple
>> different potential race producers in the kernel that we need to avoid
>> individually, so we can't always synchronize all of them when only one of
>> them
>> needs to be synchronized.
>>>
>>> The big question is what we should be doing about this. We basically have 3
>> options:
>>>
>>> * implement levels, treat racy registers as "manually synchronized", as
>> Bharat's latest patch set does
>>> * implement levels, add a bitmap for additional special synchronization
>>> bits
>>> * replace levels by bitmap
>>>
>>> I'm quite frankly not sure which one of the 3 would be the best way forward.
>>>
>>>
>>> Alex
>>
>> Implement read/write accessors for individual registers, similarly to how its
>> done in the kernel. See kvm_cache_regs.h.
>
> Read/write for individual registers can be heavy for cases where multiple
> register needed to be updated. Is not it?
It depends. We can still have classes of registers to synchronize that we could
even synchronize using MANY_REG. For writes, things become even faster with
individual accessors since we can easily coalesce all register writes until we
hit the vcpu again into a MANY_REG array and write them in one go.
>
> For cases where multiple register needed to be synchronized then I would like
> to elaborate the options as:
>
> Option 1:
> int timer_func(CPUArchState env)
> {
> cpu_synchronize_state(env);
> //update env->timer_registers
> env->updated_regs_type |= TIMER_REGS_TYPE_BIT;
To scale this one, it's probably also more clever to swap the logic:
env->kvm_sync_extra |= SYNC_TIMER_REGS;
cpu_synchronize_state(env); /* gets timer regs */
/* update env->timer_registers */
/* timer regs will be synchronized later because kvm_sync_extra has
SYNC_TIMER_REGS set */
Your case right now is special in that we don't need to get any register state,
but only write it. However, if we do
cpu_synchronize_state(env); /* will not get timer regs */
env->kvm_sync_extra |= SYNC_TIMER_REGS;
then the next
cpu_synchronize_state(env);
would overwrite the timer values again. So we would also need to indicate that
the bits are dirty:
cpu_synchronize_state(env); /* will not get timer regs */
env->kvm_sync_extra |= SYNC_TIMER_REGS;
env->kvm_sync_dirty |= SYNC_TIMER_REGS;
which cpu_synchronize_state() could use to make sure it doesn't read back any
timer regs again. It would obviously also have to set it in its normal
operation when it executed first ;).
> }
>
> Change the kvm_arch_put_registers() to add followings:
>
> int kvm_arch_put_registers(CPUArchState *env, int level)
> {
>
> If (env->updated_regs_type) {
> struct kvm_sregs sregs;
> If (env->updated_regs_type & TIMER_REGS_TYPE_BIT) {
> // update sregs->timer_registers;
> // may be we can have a bitmap to tell kernel for what
> actually updated
> }
> If (env->updated_regs_type & XX_CPU_REGS_TYPE) {
> // update respective registers in sregs
> }
>
> ioctl(env, KVM_GET_SREGS, &sregs);
> }
> }
>
> This mechanism will get all registers state but this is required only once
> per entry in QEMU.
I don't understand this bit.
>
>
> Option 2:
> Define read_regs_type() and write_regs_type() for cases where it requires
> multiple registers updates:
>
> int read_regs_type((CPUArchState env, void *regs_ptr, int reg_type)
> {
> -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
>
> Switch(reg_type) {
> Case TIMER_REGS:
> Struct *timer_regs = regs_ptr;
> Ioctl(KVM_GET_SREGS, timer_regs, CPU_reg_type);
> Break;
>
> Default:
> Printf(error);
> }
>
> Similarly will be the write function:
> int write_regs_type((CPUArchState env, int reg_type)
> {
> -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
>
> Switch(reg_type) {
> Case TIMER_REGS:
> Struct timer_regs;
> Timer_regs.reg1 = env->timer_regs->reg1;
> Timer_regs.reg2 = env->timer_regs->reg2;
> Ioctl(env, KVM_SET_SREGS, &timer_regs, CPU_reg_type);
> Break;
>
> Default:
> Printf(error);
> }
>
> Int timer_func(CPUxxState env)
> {
> Struct timer_regs;
> read_regs_type((env, &timer_regs,TIMER_REGS);
> //update env->timer_registers
> Write_regs_type(env, TIMER_REGS)
> }
>
> This will get one type of register_types and can cause multiple IOCTL per
> entry to QEMU.
This is still too explicit. How about
static inline void ppc_set_tsr(CPUState *env, target_ulong val)
{
env->kvm_sync_extra |= SYNC_TIMER_REGS;
cpu_synchronize_registers(env);
env->spr[SPR_TSR] = val;
}
static inline void ppc_set_tcr(CPUState *env, target_ulong val)
{
env->kvm_sync_extra |= SYNC_TIMER_REGS;
cpu_synchronize_registers(env);
env->spr[SPR_TCR] = val;
}
int timer_func(CPUState *env) {
ppc_set_tsr(env, 0);
ppc_set_tcr(env, 0);
}
The main benefit from this method is that we can do a simple search/replace to
get rid of essentially all synchronization pitfalls where we might miss out on
some code path that then doesn't synchronize its register set.
> -------
>
> Keep on using GET/SET_ONE_REG when only one registers needed to be updated.
We can always use ONE_REG for any of the variants.
The main question I have is whether it makes sense to move from
/* state subset only touched by the VCPU itself during runtime */
#define KVM_PUT_RUNTIME_STATE 1
/* state subset modified during VCPU reset */
#define KVM_PUT_RESET_STATE 2
/* full state set, modified during initialization or on vmload */
#define KVM_PUT_FULL_STATE 3
to a bitmap. So we would always only synchronize KVM_PUT_RUNTIME_STATE, but
keep a bitmap as explained above around for any state that is more elaborate.
For easy conversion, we could even add bits for SYNC_RESET and SYNC_FULL =
(SYNC_RESET | SYNC_FULL_REAL).
That should get rid of all the level based logic, giving us a lot more
flexibility on what we want to synchronize.
Alex
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Alexander Graf, 2013/01/03
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Jason J. Herne, 2013/01/03
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Alexander Graf, 2013/01/03
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Bhushan Bharat-R65777, 2013/01/04
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Alexander Graf, 2013/01/04
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Bhushan Bharat-R65777, 2013/01/04
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Alexander Graf, 2013/01/04
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Bhushan Bharat-R65777, 2013/01/04
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Alexander Graf, 2013/01/04
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Jason J. Herne, 2013/01/04
- Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue, Alexander Graf, 2013/01/04