qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_sta


From: Bhushan Bharat-R65777
Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
Date: Fri, 4 Jan 2013 10:23:48 +0000


> -----Original Message-----
> From: Alexander Graf [mailto:address@hidden
> Sent: Friday, January 04, 2013 2:11 PM
> To: Bhushan Bharat-R65777
> Cc: Marcelo Tosatti; address@hidden; Christian Borntraeger; Anthony
> Liguori; address@hidden qemu-devel
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> 
> On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Marcelo Tosatti [mailto:address@hidden
> >> Sent: Friday, January 04, 2013 7:08 AM
> >> To: Alexander Graf
> >> Cc: address@hidden; Christian Borntraeger; Anthony
> >> Liguori; qemu- address@hidden qemu-devel; Bhushan Bharat-R65777
> >> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> >> do_kvm_cpu_synchronize_state data integrity issue
> >>
> >> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
> >>>
> >>> On 03.01.2013, at 19:48, Jason J. Herne wrote:
> >>>
> >>>> On 01/03/2013 08:56 AM, Alexander Graf wrote:
> >>>>>> static void do_kvm_cpu_synchronize_state(void *_args)
> >>>>>>> {
> >>>>>>>    struct kvm_cpu_syncstate_args *args = _args;
> >>>>>>> +    CPUArchState *env = args->env;
> >>>>>>> +    int register_level = args->register_level;
> >>>>>>>
> >>>>> This probably becomes more readable if we explicitly revert back
> >>>>> to
> >> unsynced state first:
> >>>>>
> >>>>> /* Write back local modifications at our current level */ if
> >>>>> (register_level > env->kvm_vcpu_dirty) {
> >>>>>    kvm_arch_put_registers(...);
> >>>>>    env->kvm_vcpu_dirty = 0;
> >>>>> }
> >>>>>
> >>>>> and then do the sync we are requested to do:
> >>>>>
> >>>>> if (!env->kvm_vcpu_dirty) {
> >>>>>    ...
> >>>>> }
> >>>>
> >>>> I agree, but only if we add a second conditional to the if 1st
> >>>> statement  as
> >> such:
> >>>>
> >>>> if (args->env->kvm_vcpu_dirty && register_level >
> >>>> env->kvm_vcpu_dirty)
> >>>>
> >>>> This is to cover the case where the caller is asking for register level 
> >>>> "1"
> >> and we're already dirty at level "2". In this case, nothing should
> >> happen and we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the
> case.
> >>>
> >>> As before, I'd prefer to make this explicit:
> >>>
> >>>>
> >>>> static void do_kvm_cpu_synchronize_state(void *_args) {
> >>>>   struct kvm_cpu_syncstate_args *args = _args;
> >>>>   CPUArchState *env = args->env;
> >>>>   int register_level = args->register_level;
> >>>
> >>> if (register_level > env->kvm_vcpu_dirty) {
> >>>    /* We are more dirty than we need to - all is well */
> >>>    return;
> >>> }
> >>>
> >>>>
> >>>>   /* Write back local modifications at our current level */
> >>>>   if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) 
> >>>> {
> >>>>       kvm_arch_put_registers(env, env->kvm_vcpu_dirty);
> >>>>       env->kvm_vcpu_dirty = 0;
> >>>>   }
> >>>>
> >>>>   if (!args->env->kvm_vcpu_dirty) {
> >>>>       kvm_arch_get_registers(env, register_level);
> >>>>       env->kvm_vcpu_dirty = register_level;
> >>>>   }
> >>>> }
> >>>>
> >>>> Do you agree?  Thanks for your time. :)
> >>>
> >>> Please also check out the discussions I've had with Bharat about his
> >>> watchdog
> >> patches. There we need a mechanism to synchronize registers only when
> >> we actually need to, in order to avoid potential race conditions with
> >> a kernel timer.
> >>>
> >>> That particular case doesn't work well with levels. We can have
> >>> multiple
> >> different potential race producers in the kernel that we need to
> >> avoid individually, so we can't always synchronize all of them when
> >> only one of them needs to be synchronized.
> >>>
> >>> The big question is what we should be doing about this. We basically
> >>> have 3
> >> options:
> >>>
> >>>  * implement levels, treat racy registers as "manually
> >>> synchronized", as
> >> Bharat's latest patch set does
> >>>  * implement levels, add a bitmap for additional special
> >>> synchronization bits
> >>>  * replace levels by bitmap
> >>>
> >>> I'm quite frankly not sure which one of the 3 would be the best way 
> >>> forward.
> >>>
> >>>
> >>> Alex
> >>
> >> Implement read/write accessors for individual registers, similarly to
> >> how its done in the kernel. See kvm_cache_regs.h.
> >
> > Read/write for individual registers can be heavy for cases where multiple
> register needed to be updated. Is not it?
> 
> It depends. We can still have classes of registers to synchronize that we 
> could
> even synchronize using MANY_REG. For writes, things become even faster with
> individual accessors since we can easily coalesce all register writes until we
> hit the vcpu again into a MANY_REG array and write them in one go.
> 
> >
> > For cases where multiple register needed to be synchronized then I would 
> > like
> to elaborate the options as:
> >
> > Option 1:
> > int timer_func(CPUArchState env)
> > {
> >     cpu_synchronize_state(env);
> >     //update env->timer_registers
> >     env->updated_regs_type |= TIMER_REGS_TYPE_BIT;
> 
> To scale this one, it's probably also more clever to swap the logic:
> 
>     env->kvm_sync_extra |= SYNC_TIMER_REGS;
>     cpu_synchronize_state(env); /* gets timer regs */
>     /* update env->timer_registers */
>     /* timer regs will be synchronized later because kvm_sync_extra has
> SYNC_TIMER_REGS set */
> 
> Your case right now is special in that we don't need to get any register 
> state,
> but only write it. However, if we do
> 
>     cpu_synchronize_state(env); /* will not get timer regs */
>     env->kvm_sync_extra |= SYNC_TIMER_REGS;
> 
> then the next
> 
>     cpu_synchronize_state(env);
> 
> would overwrite the timer values again. So we would also need to indicate that
> the bits are dirty:
> 
>     cpu_synchronize_state(env); /* will not get timer regs */
>     env->kvm_sync_extra |= SYNC_TIMER_REGS;
>     env->kvm_sync_dirty |= SYNC_TIMER_REGS;
> 
> which cpu_synchronize_state() could use to make sure it doesn't read back any
> timer regs again.

yes

> It would obviously also have to set it in its normal operation
> when it executed first ;).

We also need get all registers like for debugging and we should have flag like 
ALL_REGS etc.

But I general I agree with the idea. 

> 
> > }
> >
> > Change the kvm_arch_put_registers() to add followings:
> >
> > int kvm_arch_put_registers(CPUArchState *env, int level) {
> >
> >     If (env->updated_regs_type) {
> >             struct kvm_sregs sregs;
> >             If (env->updated_regs_type & TIMER_REGS_TYPE_BIT) {
> >                     // update sregs->timer_registers;
> >                     // may be we can have a bitmap to tell kernel for what
> actually updated
> >             }
> >             If (env->updated_regs_type & XX_CPU_REGS_TYPE) {
> >                     // update respective registers in sregs
> >             }
> >
> >             ioctl(env, KVM_GET_SREGS, &sregs);
> >     }
> > }
> >
> > This mechanism will get all registers state but this is required only once 
> > per
> entry in QEMU.
> 
> I don't understand this bit
> 
> >
> >
> > Option 2:
> > Define read_regs_type() and write_regs_type() for cases where it requires
> multiple registers updates:
> >
> > int read_regs_type((CPUArchState env, void *regs_ptr, int reg_type) {
> >     -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
> >
> >     Switch(reg_type) {
> >     Case TIMER_REGS:
> >             Struct *timer_regs = regs_ptr;
> >             Ioctl(KVM_GET_SREGS, timer_regs, CPU_reg_type);
> >             Break;
> >
> >     Default:
> >             Printf(error);
> > }
> >
> > Similarly will be the write function:
> > int write_regs_type((CPUArchState env, int reg_type) {
> >     -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
> >
> >     Switch(reg_type) {
> >     Case TIMER_REGS:
> >             Struct timer_regs;
> >             Timer_regs.reg1 = env->timer_regs->reg1;
> >             Timer_regs.reg2 = env->timer_regs->reg2;
> >             Ioctl(env, KVM_SET_SREGS, &timer_regs, CPU_reg_type);
> >             Break;
> >
> >     Default:
> >             Printf(error);
> > }
> >
> > Int timer_func(CPUxxState env)
> > {
> >    Struct timer_regs;
> >    read_regs_type((env, &timer_regs,TIMER_REGS);
> >    //update env->timer_registers
> >    Write_regs_type(env, TIMER_REGS)
> > }
> >
> > This will get one type of register_types and can cause multiple IOCTL per
> entry to QEMU.
> 
> This is still too explicit. How about
> 
> static inline void ppc_set_tsr(CPUState *env, target_ulong val) {
>     env->kvm_sync_extra |= SYNC_TIMER_REGS;
>     cpu_synchronize_registers(env);
>     env->spr[SPR_TSR] = val;

You also want env->kvm_sync_dirty also, right?

> }
> 
> static inline void ppc_set_tcr(CPUState *env, target_ulong val) {
>     env->kvm_sync_extra |= SYNC_TIMER_REGS;
>     cpu_synchronize_registers(env);
>     env->spr[SPR_TCR] = val;

Same as above 

> }
> 
> int timer_func(CPUState *env) {
>     ppc_set_tsr(env, 0);

SYNC_TIMERS_REGS are get only once on first call.

>     ppc_set_tcr(env, 0);
> }
> 
> The main benefit from this method is that we can do a simple search/replace to
> get rid of essentially all synchronization pitfalls where we might miss out on
> some code path that then doesn't synchronize its register set.
> 
> > -------
> >
> > Keep on using GET/SET_ONE_REG when only one registers needed to be updated.
> 
> We can always use ONE_REG for any of the variants.
> 
> The main question I have is whether it makes sense to move from
> 
>   /* state subset only touched by the VCPU itself during runtime */
>   #define KVM_PUT_RUNTIME_STATE   1
>   /* state subset modified during VCPU reset */
>   #define KVM_PUT_RESET_STATE     2
>   /* full state set, modified during initialization or on vmload */
>   #define KVM_PUT_FULL_STATE      3
> 
> to a bitmap. So we would always only synchronize KVM_PUT_RUNTIME_STATE, but 
> keep
> a bitmap as explained above around for any state that is more elaborate. For
> easy conversion, we could even add bits for SYNC_RESET and SYNC_FULL =
> (SYNC_RESET | SYNC_FULL_REAL).

Yes, that looks better to me .

-Bharat

> 
> That should get rid of all the level based logic, giving us a lot more
> flexibility on what we want to synchronize.
> 
> 
> Alex
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]