qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset mig


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration
Date: Sun, 13 Apr 2014 01:51:24 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 04/11/2014 07:40 PM, Alexander Graf wrote:
> 
> On 10.04.14 16:31, Alexey Kardashevskiy wrote:
>> On 04/10/2014 10:34 PM, Alexander Graf wrote:
>>> On 03.04.14 15:14, Alexey Kardashevskiy wrote:
>>>> This allows guests to have a different timebase origin from the host.
>>>>
>>>> This is needed for migration, where a guest can migrate from one host
>>>> to another and the two hosts might have a different timebase origin.
>>>> However, the timebase seen by the guest must not go backwards, and
>>>> should go forwards only by a small amount corresponding to the time
>>>> taken for the migration.
>>>>
>>>> This is only supported for recent POWER hardware which has the TBU40
>>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
>>>> 970.
>>>>
>>>> This adds kvm_access_one_reg() to access a special register which is not
>>>> in env->spr.
>>>>
>>>> The feature must be present in the host kernel.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>> ---
>>>> Changes:
>>>> v4:
>>>> * made it per machine timebase offser rather than per CPU
>>>>
>>>> v3:
>>>> * kvm_access_one_reg moved out to a separate patch
>>>> * tb_offset and host_timebase were replaced with guest_timebase as
>>>> the destionation does not really care of offset on the source
>>>>
>>>> v2:
>>>> * bumped the vmstate_ppc_cpu version
>>>> * defined version for the env.tb_env field
>>>> ---
>>>>    hw/ppc/ppc.c           | 120
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    hw/ppc/spapr.c         |   3 +-
>>>>    include/hw/ppc/spapr.h |   2 +
>>>>    target-ppc/cpu-qom.h   |  16 +++++++
>>>>    target-ppc/kvm.c       |   5 +++
>>>>    target-ppc/machine.c   |   4 +-
>>>>    trace-events           |   3 ++
>>>>    7 files changed, 151 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>>>> index 9c2a132..b51db1b 100644
>>>> --- a/hw/ppc/ppc.c
>>>> +++ b/hw/ppc/ppc.c
>>>> @@ -29,9 +29,11 @@
>>>>    #include "sysemu/cpus.h"
>>>>    #include "hw/timer/m48t59.h"
>>>>    #include "qemu/log.h"
>>>> +#include "qemu/error-report.h"
>>>>    #include "hw/loader.h"
>>>>    #include "sysemu/kvm.h"
>>>>    #include "kvm_ppc.h"
>>>> +#include "trace.h"
>>>>      //#define PPC_DEBUG_IRQ
>>>>    //#define PPC_DEBUG_TB
>>>> @@ -797,6 +799,124 @@ static void cpu_ppc_set_tb_clk (void *opaque,
>>>> uint32_t freq)
>>>>        cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>>>>    }
>>>>    +/*
>>>> + * Calculate timebase on the destination side of migration
>>>> + *
>>>> + * We calculate new timebase offset as shown below:
>>>> + * 1) Gtb2 = Gtb1 + max(tod2 - tod1, 0)
>>>> + *    Gtb2 = tb2 + off2
>>>> + * 2) tb2 + off2 = Gtb1 + max(tod2 - tod1, 0)
>>>> + * 3) off2 = Gtb1 - tb2 + max(tod2 - tod1, 0)
>>>> + *
>>>> + * where:
>>>> + * Gtb2 - destination guest timebase
>>>> + * tb2 - destination host timebase
>>>> + * off2 - destination timebase offset
>>>> + * tod2 - destination time of the day
>>>> + * Gtb1 - source guest timebase
>>>> + * tod1 - source time of the day
>>>> + *
>>>> + * The result we want is in @off2
>>>> + *
>>>> + * Two conditions must be met for @off2:
>>>> + * 1) off2 must be multiple of 2^24 ticks as it will be set via TBU40 SPR
>>>> + * 2) Gtb2 >= Gtb1
>>>> + */
>>>> +static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
>>>> +{
>>>> +    uint64_t tb2, tod2;
>>>> +    int64_t off2;
>>>> +    int ratio = tb->freq / 1000000;
>>>> +    struct timeval tv;
>>>> +
>>>> +    tb2 = cpu_get_real_ticks();
>>>> +    gettimeofday(&tv, NULL);
>>>> +    tod2 = tv.tv_sec * 1000000 + tv.tv_usec;
>>>> +
>>>> +    off2 = tb->guest_timebase - tb2;
>>>> +    if ((tod2 > tb->time_of_the_day) &&
>>>> +        (tod2 - tb->time_of_the_day < 1000000)) {
>>>> +        off2 += (tod2 - tb->time_of_the_day) * ratio;
>>>> +    }
>>>> +    off2 = ROUND_UP(off2, 1 << 24);
>>>> +
>>>> +    return off2;
>>>> +}
>>> I *think* what you're trying to say here is that you want
>>>
>>> assert(source_timebase_freq == timebase_freq);
>>>
>>> migration_duration_ns = host_ns - source_host_ns;
>>> guest_tb = source_guest_tb + ns_scaled_to_tb(min(0, migration_duration_ns);
>>> kvm_set_guest_tb(guest_tb);
>>>    -> kvm_set_one_reg(KVM_REG_PPC_TB_OFFSET, guest_tb - mftb());
>>>
>>> But I honestly have not managed to read that from the code. Either this
>>> really is what you're trying to do and the code is just very hard to read
>>> (which means it needs to be written more easily) or you're doing something
>>> different which I don't understand.
>>
>> Is this any better?
>>
>> static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
>> {
>>     struct timeval tv;
>>     int64_t migration_duration_ns, migration_duration_tb;
> 
> If I read the code correctly you're operating in us, not ns, no?
> 
>>     int64_t guest_tb, host_ns;
>>     int ratio = tb->freq / 1000000;
> 
> #define USEC_PER_SEC 1000000
>
> You're also losing quite a bit of precision here, no?

Am I? For tb_freq==512MHz and the way I use it below?


> 
>>     int64_t off;
>>
>>     gettimeofday(&tv, NULL);
>>     host_ns = tv.tv_sec * 1000000 + tv.tv_usec;
> 
> host_us = get_clock_realtime() / 1000; ?
>
>>     migration_duration_ns = MIN(1000000,
> 
> Why is it MIN(1000000)? Is a migration supposed to last at least 1sec? Why?

It should probably be max_downtime (30ms which I am trying to change to
300ms in another patch), something like that. And if it is bigger than 1
seconds, it is either veeeery slow connection or time is not synchronized
between hosts, and we try to be precise only if time is in sync.


>>         host_ns - tb->time_of_the_day);
>>     migration_duration_tb = migration_duration_ns * ratio;
>>
>>     guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb);
>>
>>     off = guest_tb - cpu_get_real_ticks();
> 
> It's probably easier to read when you create one function that just returns
> a guest TB value adjusted by the time the last measurement happened. The
> fact that the KVM register wants an offset is a KVM implementation detail.
> The TB adjustment should happen generically.
>
>>
>>     return off;
>> }
>>
>>
>>> We also designed the PPC_TB_OFFSET ONE_REG in a way that it always rounds
>>> up to its 40 bit granularity, so no need to do this in QEMU. In fact, we
>>> don't want to do it in QEMU in case there will be a more fine-grained SPR
>>> in the future.
>> I believe rounding was not in the kernel when I started making this...
>>
>>
>>> And from all I understand the timebase frequency is now architecturally
>>> specified, so it won't change for newer cores, no?
>> I asked people in our lab. Everyone says that it should not change but
>> noone would bet on it too much.
> 
> When it changes and you want to live migrate, you'll need to implement a
> guest TB scale register and the whole idea of a "TB offset" ONE_REG is absurd.
> 
> The more I think about this the more I realize we should have created a
> "guest TB value", not a "guest TB offset" ONE_REG.
> 
>>
>>
>>> And if we migrate TCG
>>> guests it will be the same between two hosts.
>> And G5 uses 33333333. I really do not understand why it is bad to
>> send-and-check timer frequency. Why?
> 
> Because the guest will continue to run at a different TB frequency on the
> new host and break.
>
>> Is the rest ok? Thanks for review!
> 
> Not sure. Please rework everything according to the comments, make the code
> readable enough that your wife understands it and then resend it :).

I am about to post v5, please take a look. Thanks!



-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]