[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migrat
From: |
Anthony Liguori |
Subject: |
Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration |
Date: |
Fri, 10 May 2013 10:11:31 -0500 |
User-agent: |
Notmuch/0.15.2+77~g661dcf8 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) |
Chegu Vinod <address@hidden> writes:
> On 5/10/2013 6:07 AM, Anthony Liguori wrote:
>> Chegu Vinod <address@hidden> writes:
>>
>>> If a user chooses to turn on the auto-converge migration capability
>>> these changes detect the lack of convergence and throttle down the
>>> guest. i.e. force the VCPUs out of the guest for some duration
>>> and let the migration thread catchup and help converge.
>>>
>>> Verified the convergence using the following :
>>> - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy)
>>> - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
>>>
>>> Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and
>>> migrate downtime set to 4seconds).
>> Would it make sense to separate out the "slow the VCPU down" part of
>> this?
>>
>> That would give a management tool more flexibility to create policies
>> around slowing the VCPU down to encourage migration.
>
> I believe one can always enhance libvirt tools to monitor the migration
> statistics and control the shares/entitlements of the vcpus via
> cgroups..thereby slowing the guest down to allow for convergence (I had
> that listed in my earlier versions of the patches as an option and also
> noted that it requires external (i.e. tool driven) monitoring and
> triggers...and that this alternative was kind of automatic after the
> initial setting of the capability).
>
> Is that what you meant by your comment above (or) are you talking about
> something outside the scope of cgroups and from an implementation point
> of view also outside the migration code path...i.e. a new knob that an
> external tool can use to just throttle down the vcpus of a guest ?
I'm saying, a knob to throttle the guest vcpus within QEMU that could be
used by management tools to encourage convergence.
For instance, consider an imaginary "vcpu_throttle" command that took a
number between 0 and 1 that throttled VCPU performance accordingly.
Then migration would look like:
0) throttle = 1.0
1) call migrate command to start migration
2) query progress until you decide you aren't converging
3) throttle *= 0.75; call vcpu_throttle $throttle
4) goto (2)
Now I'm not opposed to a series like this that adds this sort of policy
to QEMU itself too but I want to make sure the pieces are exposed for a
management tool to implement its own policies too.
Regards,
Anthony Liguori
>
> Thanks
> Vinod
>
>
>
>>
>> In fact, I wonder if we need anything in the migration path if we just
>> expose the "slow the VCPU down" bit as a feature.
>>
>> Slow the VCPU down is not quite the same as setting priority of the VCPU
>> thread largely because of the QBL so I recognize the need to have
>> something for this in QEMU.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> (qemu) info migrate
>>> capabilities: xbzrle: off auto-converge: off <----
>>> Migration status: active
>>> total time: 1487503 milliseconds
>>> expected downtime: 519 milliseconds
>>> transferred ram: 383749347 kbytes
>>> remaining ram: 2753372 kbytes
>>> total ram: 268444224 kbytes
>>> duplicate: 65461532 pages
>>> skipped: 64901568 pages
>>> normal: 95750218 pages
>>> normal bytes: 383000872 kbytes
>>> dirty pages rate: 67551 pages
>>>
>>> ---
>>>
>>> (qemu) info migrate
>>> capabilities: xbzrle: off auto-converge: on <----
>>> Migration status: completed
>>> total time: 241161 milliseconds
>>> downtime: 6373 milliseconds
>>> transferred ram: 28235307 kbytes
>>> remaining ram: 0 kbytes
>>> total ram: 268444224 kbytes
>>> duplicate: 64946416 pages
>>> skipped: 64903523 pages
>>> normal: 7044971 pages
>>> normal bytes: 28179884 kbytes
>>>
>>> Signed-off-by: Chegu Vinod <address@hidden>
>>> ---
>>> arch_init.c | 68
>>> +++++++++++++++++++++++++++++++++++++++++
>>> include/migration/migration.h | 4 ++
>>> migration.c | 1 +
>>> 3 files changed, 73 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch_init.c b/arch_init.c
>>> index 49c5dc2..29788d6 100644
>>> --- a/arch_init.c
>>> +++ b/arch_init.c
>>> @@ -49,6 +49,7 @@
>>> #include "trace.h"
>>> #include "exec/cpu-all.h"
>>> #include "hw/acpi/acpi.h"
>>> +#include "sysemu/cpus.h"
>>>
>>> #ifdef DEBUG_ARCH_INIT
>>> #define DPRINTF(fmt, ...) \
>>> @@ -104,6 +105,8 @@ int graphic_depth = 15;
>>> #endif
>>>
>>> const uint32_t arch_type = QEMU_ARCH;
>>> +static bool mig_throttle_on;
>>> +
>>>
>>> /***********************************************************/
>>> /* ram save/restore */
>>> @@ -378,8 +381,15 @@ static void migration_bitmap_sync(void)
>>> uint64_t num_dirty_pages_init = migration_dirty_pages;
>>> MigrationState *s = migrate_get_current();
>>> static int64_t start_time;
>>> + static int64_t bytes_xfer_prev;
>>> static int64_t num_dirty_pages_period;
>>> int64_t end_time;
>>> + int64_t bytes_xfer_now;
>>> + static int dirty_rate_high_cnt;
>>> +
>>> + if (!bytes_xfer_prev) {
>>> + bytes_xfer_prev = ram_bytes_transferred();
>>> + }
>>>
>>> if (!start_time) {
>>> start_time = qemu_get_clock_ms(rt_clock);
>>> @@ -404,6 +414,23 @@ static void migration_bitmap_sync(void)
>>>
>>> /* more than 1 second = 1000 millisecons */
>>> if (end_time > start_time + 1000) {
>>> + if (migrate_auto_converge()) {
>>> + /* The following detection logic can be refined later. For now:
>>> + Check to see if the dirtied bytes is 50% more than the
>>> approx.
>>> + amount of bytes that just got transferred since the last
>>> time we
>>> + were in this routine. If that happens N times (for now N==5)
>>> + we turn on the throttle down logic */
>>> + bytes_xfer_now = ram_bytes_transferred();
>>> + if (s->dirty_pages_rate &&
>>> + ((num_dirty_pages_period*TARGET_PAGE_SIZE) >
>>> + ((bytes_xfer_now - bytes_xfer_prev)/2))) {
>>> + if (dirty_rate_high_cnt++ > 5) {
>>> + DPRINTF("Unable to converge. Throtting down guest\n");
>>> + mig_throttle_on = true;
>>> + }
>>> + }
>>> + bytes_xfer_prev = bytes_xfer_now;
>>> + }
>>> s->dirty_pages_rate = num_dirty_pages_period * 1000
>>> / (end_time - start_time);
>>> s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE;
>>> @@ -496,6 +523,15 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>> return bytes_sent;
>>> }
>>>
>>> +bool throttling_needed(void)
>>> +{
>>> + if (!migrate_auto_converge()) {
>>> + return false;
>>> + }
>>> +
>>> + return mig_throttle_on;
>>> +}
>>> +
>>> static uint64_t bytes_transferred;
>>>
>>> static ram_addr_t ram_save_remaining(void)
>>> @@ -1098,3 +1134,35 @@ TargetInfo *qmp_query_target(Error **errp)
>>>
>>> return info;
>>> }
>>> +
>>> +static void mig_delay_vcpu(void)
>>> +{
>>> + qemu_mutex_unlock_iothread();
>>> + g_usleep(50*1000);
>>> + qemu_mutex_lock_iothread();
>>> +}
>>> +
>>> +/* Stub used for getting the vcpu out of VM and into qemu via
>>> + run_on_cpu()*/
>>> +static void mig_kick_cpu(void *opq)
>>> +{
>>> + mig_delay_vcpu();
>>> + return;
>>> +}
>>> +
>>> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
>>> + much time in the VM. The migration thread will try to catchup.
>>> + Workload will experience a performance drop.
>>> +*/
>>> +void migration_throttle_down(void)
>>> +{
>>> + if (throttling_needed()) {
>>> + CPUArchState *penv = first_cpu;
>>> + while (penv) {
>>> + qemu_mutex_lock_iothread();
>>> + async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
>>> + qemu_mutex_unlock_iothread();
>>> + penv = penv->next_cpu;
>>> + }
>>> + }
>>> +}
>>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>>> index ace91b0..68b65c6 100644
>>> --- a/include/migration/migration.h
>>> +++ b/include/migration/migration.h
>>> @@ -129,4 +129,8 @@ int64_t migrate_xbzrle_cache_size(void);
>>> int64_t xbzrle_cache_resize(int64_t new_size);
>>>
>>> bool migrate_auto_converge(void);
>>> +bool throttling_needed(void);
>>> +void stop_throttling(void);
>>> +void migration_throttle_down(void);
>>> +
>>> #endif
>>> diff --git a/migration.c b/migration.c
>>> index 570cee5..d3673a6 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -526,6 +526,7 @@ static void *migration_thread(void *opaque)
>>> DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
>>> if (pending_size && pending_size >= max_size) {
>>> qemu_savevm_state_iterate(s->file);
>>> + migration_throttle_down();
>>> } else {
>>> DPRINTF("done iterating\n");
>>> qemu_mutex_lock_iothread();
>>> --
>>> 1.7.1
>> .
>>
[Qemu-devel] [RFC PATCH v5 1/3] Introduce async_run_on_cpu(), Chegu Vinod, 2013/05/09