[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migra
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration |
Date: |
Sun, 13 Apr 2014 01:51:24 +1000 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 |
On 04/11/2014 07:40 PM, Alexander Graf wrote:
>
> On 10.04.14 16:31, Alexey Kardashevskiy wrote:
>> On 04/10/2014 10:34 PM, Alexander Graf wrote:
>>> On 03.04.14 15:14, Alexey Kardashevskiy wrote:
>>>> This allows guests to have a different timebase origin from the host.
>>>>
>>>> This is needed for migration, where a guest can migrate from one host
>>>> to another and the two hosts might have a different timebase origin.
>>>> However, the timebase seen by the guest must not go backwards, and
>>>> should go forwards only by a small amount corresponding to the time
>>>> taken for the migration.
>>>>
>>>> This is only supported for recent POWER hardware which has the TBU40
>>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
>>>> 970.
>>>>
>>>> This adds kvm_access_one_reg() to access a special register which is not
>>>> in env->spr.
>>>>
>>>> The feature must be present in the host kernel.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>> ---
>>>> Changes:
>>>> v4:
>>>> * made it per machine timebase offser rather than per CPU
>>>>
>>>> v3:
>>>> * kvm_access_one_reg moved out to a separate patch
>>>> * tb_offset and host_timebase were replaced with guest_timebase as
>>>> the destionation does not really care of offset on the source
>>>>
>>>> v2:
>>>> * bumped the vmstate_ppc_cpu version
>>>> * defined version for the env.tb_env field
>>>> ---
>>>> hw/ppc/ppc.c | 120
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>> hw/ppc/spapr.c | 3 +-
>>>> include/hw/ppc/spapr.h | 2 +
>>>> target-ppc/cpu-qom.h | 16 +++++++
>>>> target-ppc/kvm.c | 5 +++
>>>> target-ppc/machine.c | 4 +-
>>>> trace-events | 3 ++
>>>> 7 files changed, 151 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>>>> index 9c2a132..b51db1b 100644
>>>> --- a/hw/ppc/ppc.c
>>>> +++ b/hw/ppc/ppc.c
>>>> @@ -29,9 +29,11 @@
>>>> #include "sysemu/cpus.h"
>>>> #include "hw/timer/m48t59.h"
>>>> #include "qemu/log.h"
>>>> +#include "qemu/error-report.h"
>>>> #include "hw/loader.h"
>>>> #include "sysemu/kvm.h"
>>>> #include "kvm_ppc.h"
>>>> +#include "trace.h"
>>>> //#define PPC_DEBUG_IRQ
>>>> //#define PPC_DEBUG_TB
>>>> @@ -797,6 +799,124 @@ static void cpu_ppc_set_tb_clk (void *opaque,
>>>> uint32_t freq)
>>>> cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>>>> }
>>>> +/*
>>>> + * Calculate timebase on the destination side of migration
>>>> + *
>>>> + * We calculate new timebase offset as shown below:
>>>> + * 1) Gtb2 = Gtb1 + max(tod2 - tod1, 0)
>>>> + * Gtb2 = tb2 + off2
>>>> + * 2) tb2 + off2 = Gtb1 + max(tod2 - tod1, 0)
>>>> + * 3) off2 = Gtb1 - tb2 + max(tod2 - tod1, 0)
>>>> + *
>>>> + * where:
>>>> + * Gtb2 - destination guest timebase
>>>> + * tb2 - destination host timebase
>>>> + * off2 - destination timebase offset
>>>> + * tod2 - destination time of the day
>>>> + * Gtb1 - source guest timebase
>>>> + * tod1 - source time of the day
>>>> + *
>>>> + * The result we want is in @off2
>>>> + *
>>>> + * Two conditions must be met for @off2:
>>>> + * 1) off2 must be multiple of 2^24 ticks as it will be set via TBU40 SPR
>>>> + * 2) Gtb2 >= Gtb1
>>>> + */
>>>> +static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
>>>> +{
>>>> + uint64_t tb2, tod2;
>>>> + int64_t off2;
>>>> + int ratio = tb->freq / 1000000;
>>>> + struct timeval tv;
>>>> +
>>>> + tb2 = cpu_get_real_ticks();
>>>> + gettimeofday(&tv, NULL);
>>>> + tod2 = tv.tv_sec * 1000000 + tv.tv_usec;
>>>> +
>>>> + off2 = tb->guest_timebase - tb2;
>>>> + if ((tod2 > tb->time_of_the_day) &&
>>>> + (tod2 - tb->time_of_the_day < 1000000)) {
>>>> + off2 += (tod2 - tb->time_of_the_day) * ratio;
>>>> + }
>>>> + off2 = ROUND_UP(off2, 1 << 24);
>>>> +
>>>> + return off2;
>>>> +}
>>> I *think* what you're trying to say here is that you want
>>>
>>> assert(source_timebase_freq == timebase_freq);
>>>
>>> migration_duration_ns = host_ns - source_host_ns;
>>> guest_tb = source_guest_tb + ns_scaled_to_tb(min(0, migration_duration_ns);
>>> kvm_set_guest_tb(guest_tb);
>>> -> kvm_set_one_reg(KVM_REG_PPC_TB_OFFSET, guest_tb - mftb());
>>>
>>> But I honestly have not managed to read that from the code. Either this
>>> really is what you're trying to do and the code is just very hard to read
>>> (which means it needs to be written more easily) or you're doing something
>>> different which I don't understand.
>>
>> Is this any better?
>>
>> static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
>> {
>> struct timeval tv;
>> int64_t migration_duration_ns, migration_duration_tb;
>
> If I read the code correctly you're operating in us, not ns, no?
>
>> int64_t guest_tb, host_ns;
>> int ratio = tb->freq / 1000000;
>
> #define USEC_PER_SEC 1000000
>
> You're also losing quite a bit of precision here, no?
Am I? For tb_freq==512MHz and the way I use it below?
>
>> int64_t off;
>>
>> gettimeofday(&tv, NULL);
>> host_ns = tv.tv_sec * 1000000 + tv.tv_usec;
>
> host_us = get_clock_realtime() / 1000; ?
>
>> migration_duration_ns = MIN(1000000,
>
> Why is it MIN(1000000)? Is a migration supposed to last at least 1sec? Why?
It should probably be max_downtime (30ms which I am trying to change to
300ms in another patch), something like that. And if it is bigger than 1
seconds, it is either veeeery slow connection or time is not synchronized
between hosts, and we try to be precise only if time is in sync.
>> host_ns - tb->time_of_the_day);
>> migration_duration_tb = migration_duration_ns * ratio;
>>
>> guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb);
>>
>> off = guest_tb - cpu_get_real_ticks();
>
> It's probably easier to read when you create one function that just returns
> a guest TB value adjusted by the time the last measurement happened. The
> fact that the KVM register wants an offset is a KVM implementation detail.
> The TB adjustment should happen generically.
>
>>
>> return off;
>> }
>>
>>
>>> We also designed the PPC_TB_OFFSET ONE_REG in a way that it always rounds
>>> up to its 40 bit granularity, so no need to do this in QEMU. In fact, we
>>> don't want to do it in QEMU in case there will be a more fine-grained SPR
>>> in the future.
>> I believe rounding was not in the kernel when I started making this...
>>
>>
>>> And from all I understand the timebase frequency is now architecturally
>>> specified, so it won't change for newer cores, no?
>> I asked people in our lab. Everyone says that it should not change but
>> noone would bet on it too much.
>
> When it changes and you want to live migrate, you'll need to implement a
> guest TB scale register and the whole idea of a "TB offset" ONE_REG is absurd.
>
> The more I think about this the more I realize we should have created a
> "guest TB value", not a "guest TB offset" ONE_REG.
>
>>
>>
>>> And if we migrate TCG
>>> guests it will be the same between two hosts.
>> And G5 uses 33333333. I really do not understand why it is bad to
>> send-and-check timer frequency. Why?
>
> Because the guest will continue to run at a different TB frequency on the
> new host and break.
>
>> Is the rest ok? Thanks for review!
>
> Not sure. Please rework everything according to the comments, make the code
> readable enough that your wife understands it and then resend it :).
I am about to post v5, please take a look. Thanks!
--
Alexey
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, (continued)
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, Alexander Graf, 2014/04/14
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, Alexey Kardashevskiy, 2014/04/14
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, Alexander Graf, 2014/04/14
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, Benjamin Herrenschmidt, 2014/04/14
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, Benjamin Herrenschmidt, 2014/04/14
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, Benjamin Herrenschmidt, 2014/04/14
- Re: [Qemu-ppc] [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration, Peter Maydell, 2014/04/12
- Re: [Qemu-ppc] [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration, Benjamin Herrenschmidt, 2014/04/13
- Re: [Qemu-ppc] [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration, Alexander Graf, 2014/04/14
- Re: [Qemu-ppc] [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration, Benjamin Herrenschmidt, 2014/04/14
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration,
Alexey Kardashevskiy <=
- Re: [Qemu-ppc] [PATCH 4/4] spapr: Add support for time base offset migration, Alexander Graf, 2014/04/14
[Qemu-ppc] [PATCH 3/4] KVM: PPC: Support POWER8 registers, Alexey Kardashevskiy, 2014/04/03
[Qemu-ppc] [PATCH 2/4] spapr: Enable DABRX special register, Alexey Kardashevskiy, 2014/04/03