qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] Migrating decrementer


From: Mark Cave-Ayland
Subject: Re: [Qemu-devel] [Qemu-ppc] Migrating decrementer
Date: Mon, 29 Feb 2016 20:21:39 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0

On 29/02/16 03:57, David Gibson wrote:

> On Fri, Feb 26, 2016 at 12:29:51PM +0000, Mark Cave-Ayland wrote:
>> On 26/02/16 04:35, David Gibson wrote:
>>
>>>> Sign. And let me try that again, this time after caffeine:
>>>>
>>>> cpu_start/resume():
>>>>     cpu->tb_env->tb_offset =
>>>>         muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
>>>>                  cpu->tb_env->tb_freq, NANOSECONDS_PER_SECOND) +
>>>>             cpu->tb_env->tb_offset -
>>>>         cpu_get_host_ticks();
>>>>
>>>> This should translate to: at CPU start, calculate the difference between
>>>> the current guest virtual timebase and the host timebase, storing the
>>>> difference in cpu->tb_env->tb_offset.
>>>
>>> Ummm... I think that's right.  Except that you need to make sure you
>>> calculate the tb_offset just once, and set the same value to all guest
>>> CPUs.  Otherwise the guest TBs may be slightly out of sync with each
>>> other, which is bad (the host should have already ensure that all host
>>> TBs are in sync with each other).
>>
>> Nods. The reason I really like this solution is because it correctly
>> handles both paused/live machines and simplifies the migration logic
>> significantly. In fact, with this solution the only information you
>> would need in vmstate_ppc_timebase for migration would be the current
>> tb_offset so the receiving host can calculate the guest timebase from
>> the virtual clock accordingly.
> 
>>> We really should make helper routines that each Power machine type can
>>> use for this.  Unfortunately we can't put it directly into the common
>>> ppc cpu migration code because of the requirement to keep the TBs
>>> synced across the machine.
>>
>> Effectively I believe there are 2 cases here: TCG and KVM. For TCG all
>> of this is a no-op since tb/decr are already derived from the virtual
>> clock + tb_offset already so that really just leaves KVM.
>>
>> >From what you're saying we would need 2 hooks for KVM here: one on
>> machine start/resume which updates tb_offset for all vCPUs and one for
>> vCPU resume which updates just that one particular vCPU.
>>
>> Just curious as to your comment regarding helper routines for each Power
>> machine type - is there any reason not to enable this globally for all
>> Power machine types? If tb_offset isn't supported by the guest then the
>> problem with not being able to handle a jump in timebase post-migration
>> still remains exactly the same.
> 
> Well, I can't see a place to put it globally.  We can't put it in the
> general vCPU stuff, because that would migrate each CPU's timebase
> independently, but we want to migrate as a system wide operation, to
> ensure the TBs are all synchronized in the destination guest.
> 
> To do the platform wide stuff, it pretty much has to be in the machine
> type.

(goes and looks)

It strikes me that a good solution here would be to introduce a new
PPCMachineClass from which all of the PPC machines could derive instead
of each different machine being a direct subclass of MachineClass (this
is not dissimilar as to the existing PCMachineClass) and move the
timebase and decrementer information into it. With this then all of the
PPC machine types can pick up the changes automatically.

>> The other question of course relates to the reason this thread was
>> started which is will this approach still support migrating the
>> decrementer? My feeling is that this would still work in that we would
>> encode the number of ticks before the decrementer reaches its interrupt
>> value into vmstate_ppc_timebase, whether high or low. For TCG it is easy
>> to ensure that the decrementer will still fire in n ticks time
>> post-migration (which solves my particular use case), but I don't have a
>> feeling as to whether this is possible, or indeed desirable for KVM.
> 
> Yes, for TCG it should be fairly straightforward.  The DECR should be
> calculated from the timebase.  We do need to check it on incoming
> migration though, and check when we need to refire the decrementer
> interrupt.

So just to confirm that while reads from the timebase are not privileged
(and so cannot be intercepted between host and guest), we still have
individual control of the per-guest decrementer interrupts?

> For KVM we'll need to load an appropriate value into the real
> decrementer.  We probably want to migrate a difference between the TB
> and the decrementer.  What could get hairy here is that there are a
> number of different variants between ppc models on how exactly the
> decrementer interrupt triggers: is it edge-triggered on 1->0
> transition, edge-triggered on 0->-1 transition, or level triggered on
> the DECR's sign bit.  

I don't think that is too much of a problem, since for TCG the logic is
already encapsulated in hw/ppc/ppc.c's __cpu_ppc_store_decr(). It should
be possible to move this logic into a shared helper function to keep
everything in one place.

Finally just to re-iterate that while I can write and compile-test a
potential patchset, I have no way to test the KVM parts. If I were to
dedicate some time to this, would yourself/Alex/Alexey be willing to
help test and debug these changes?


ATB,

Mark.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]