[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH V7 RESEND 12/17] savevm: split the process of di
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH V7 RESEND 12/17] savevm: split the process of different stages for loadvm/savevm |
Date: |
Tue, 19 Jun 2018 20:00:47 +0100 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
* Zhang Chen (address@hidden) wrote:
> On Wed, May 16, 2018 at 2:56 AM, Dr. David Alan Gilbert <address@hidden
> > wrote:
>
> > * Zhang Chen (address@hidden) wrote:
> > > From: zhanghailiang <address@hidden>
> > >
> > > There are several stages during loadvm/savevm process. In different
> > stage,
> > > migration incoming processes different types of sections.
> > > We want to control these stages more accuracy, it will benefit COLO
> > > performance, we don't have to save type of QEMU_VM_SECTION_START
> > > sections everytime while do checkpoint, besides, we want to separate
> > > the process of saving/loading memory and devices state.
> > >
> > > So we add three new helper functions: qemu_load_device_state() and
> > > qemu_savevm_live_state() to achieve different process during migration.
> > >
> > > Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
> > > public, and simplify the codes of qemu_save_device_state() by calling the
> > > wrapper qemu_savevm_state_header().
> > >
> > > Signed-off-by: zhanghailiang <address@hidden>
> > > Signed-off-by: Li Zhijian <address@hidden>
> > > Signed-off-by: Zhang Chen <address@hidden>
> > > Reviewed-by: Dr. David Alan Gilbert <address@hidden>
> > > ---
> > > migration/colo.c | 36 ++++++++++++++++++++++++++++--------
> > > migration/savevm.c | 35 ++++++++++++++++++++++++++++-------
> > > migration/savevm.h | 4 ++++
> > > 3 files changed, 60 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c
> > > index cdff0a2490..5b055f79f1 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -30,6 +30,7 @@
> > > #include "block/block.h"
> > > #include "qapi/qapi-events-migration.h"
> > > #include "qapi/qmp/qerror.h"
> > > +#include "sysemu/cpus.h"
> > >
> > > static bool vmstate_loading;
> > > static Notifier packets_compare_notifier;
> > > @@ -414,23 +415,30 @@ static int
> > > colo_do_checkpoint_transaction(MigrationState
> > *s,
> > >
> > > /* Disable block migration */
> > > migrate_set_block_enabled(false, &local_err);
> > > - qemu_savevm_state_header(fb);
> > > - qemu_savevm_state_setup(fb);
> > > qemu_mutex_lock_iothread();
> > > replication_do_checkpoint_all(&local_err);
> > > if (local_err) {
> > > qemu_mutex_unlock_iothread();
> > > goto out;
> > > }
> > > - qemu_savevm_state_complete_precopy(fb, false, false);
> > > - qemu_mutex_unlock_iothread();
> > > -
> > > - qemu_fflush(fb);
> > >
> > > colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND,
> > &local_err);
> > > if (local_err) {
> > > goto out;
> > > }
> > > + /*
> > > + * Only save VM's live state, which not including device state.
> > > + * TODO: We may need a timeout mechanism to prevent COLO process
> > > + * to be blocked here.
> > > + */
> >
> > I guess that's the downside to transmitting it directly than into the
> > buffer;
> > Peter Xu's OOB command system would let you kill the connection - and
> > that's something I think COLO should use.
> > Still the change saves you having that huge outgoing buffer on the
> > source side and lets you start sending the checkpoint sooner, which
> > means the pause time should be smaller.
> >
>
> Yes, you are right.
> But I think this is a performance optimization, this series focus on
> enabling.
> I will do this job in the future.
>
>
> >
> > > + qemu_savevm_live_state(s->to_dst_file);
> >
> > Does this actually need to be inside of the qemu_mutex_lock_iothread?
> > I'm pretty sure the device_state needs to be, but I'm not sure the
> > live_state needs to.
> >
>
> I have checked the codes, qemu_savevm_live_state needn't inside of the
> qemu_mutex_lock_iothread,
> I will move the it out the lock area in next version.
>
>
>
> >
> > > + /* Note: device state is saved into buffer */
> > > + ret = qemu_save_device_state(fb);
> > > +
> > > + qemu_mutex_unlock_iothread();
> > > +
> > > + qemu_fflush(fb);
> > > +
> > > /*
> > > * We need the size of the VMstate data in Secondary side,
> > > * With which we can decide how much data should be read.
> > > @@ -643,6 +651,7 @@ void *colo_process_incoming_thread(void *opaque)
> > > uint64_t total_size;
> > > uint64_t value;
> > > Error *local_err = NULL;
> > > + int ret;
> > >
> > > qemu_sem_init(&mis->colo_incoming_sem, 0);
> > >
> > > @@ -715,6 +724,16 @@ void *colo_process_incoming_thread(void *opaque)
> > > goto out;
> > > }
> > >
> > > + qemu_mutex_lock_iothread();
> > > + cpu_synchronize_all_pre_loadvm();
> > > + ret = qemu_loadvm_state_main(mis->from_src_file, mis);
> > > + qemu_mutex_unlock_iothread();
> > > +
> > > + if (ret < 0) {
> > > + error_report("Load VM's live state (ram) error");
> > > + goto out;
> > > + }
> > > +
> > > value = colo_receive_message_value(mis->from_src_file,
> > > COLO_MESSAGE_VMSTATE_SIZE, &local_err);
> > > if (local_err) {
> > > @@ -748,8 +767,9 @@ void *colo_process_incoming_thread(void *opaque)
> > > qemu_mutex_lock_iothread();
> > > qemu_system_reset(SHUTDOWN_CAUSE_NONE);
> >
> > Is the reset safe? Are you sure it doesn't change the ram you've just
> > loaded?
> >
>
> Yes, It is safe. In my test the secondary node running well.
The fact it's working doesn't mean it's safe; it just means you've
not hit a problem! A qemu_system_reset calls a 'reset' on all of the
devices, nd calls cpu_synchronize_all_states() - I'd worry that any of
those might touch RAM.
> >
> > > vmstate_loading = true;
> > > - if (qemu_loadvm_state(fb) < 0) {
> > > - error_report("COLO: loadvm failed");
> > > + ret = qemu_load_device_state(fb);
> > > + if (ret < 0) {
> > > + error_report("COLO: load device state failed");
> > > qemu_mutex_unlock_iothread();
> > > goto out;
> > > }
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index ec0bff09ce..0f61239429 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -1332,13 +1332,20 @@ done:
> > > return ret;
> > > }
> > >
> > > -static int qemu_save_device_state(QEMUFile *f)
> > > +void qemu_savevm_live_state(QEMUFile *f)
> > > {
> > > - SaveStateEntry *se;
> > > + /* save QEMU_VM_SECTION_END section */
> > > + qemu_savevm_state_complete_precopy(f, true, false);
> > > + qemu_put_byte(f, QEMU_VM_EOF);
> > > +}
> > >
> > > - qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> > > - qemu_put_be32(f, QEMU_VM_FILE_VERSION);
> > > +int qemu_save_device_state(QEMUFile *f)
> > > +{
> > > + SaveStateEntry *se;
> > >
> > > + if (!migration_in_colo_state()) {
> > > + qemu_savevm_state_header(f);
> > > + }
> > > cpu_synchronize_all_states();
> >
> > So this changes qemu_save_device_state to use savevm_state_header
> > which feels reasonable, but that includes the 'configuration'
> > section; do we want that? Is that OK for Xen's use in
> > qmp_xen_save_devices_state?
> >
>
> If I understand correctly, COLO Xen don't use qemu migration codes do this
> job currently,
> COLO Xen have an independent COLO frame do same job(use Xen migration
> codes),
> So I think it is OK for Xen.
It's important not to break non-COLO Xen though; so you need to check
with the Xen people that a change that impacts
qmp_xen_save_devices_state is OK.
Dave
>
>
> Thanks your review and continued support.
> Zhang Chen
>
>
> >
> > > QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > > @@ -1394,8 +1401,6 @@ enum LoadVMExitCodes {
> > > LOADVM_QUIT = 1,
> > > };
> > >
> > > -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState
> > *mis);
> > > -
> > > /* ------ incoming postcopy messages ------ */
> > > /* 'advise' arrives before any transfers just to tell us that a postcopy
> > > * *might* happen - it might be skipped if precopy transferred
> > everything
> > > @@ -2075,7 +2080,7 @@ void qemu_loadvm_state_cleanup(void)
> > > }
> > > }
> > >
> > > -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState
> > *mis)
> > > +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > {
> > > uint8_t section_type;
> > > int ret = 0;
> > > @@ -2229,6 +2234,22 @@ int qemu_loadvm_state(QEMUFile *f)
> > > return ret;
> > > }
> > >
> > > +int qemu_load_device_state(QEMUFile *f)
> > > +{
> > > + MigrationIncomingState *mis = migration_incoming_get_current();
> > > + int ret;
> > > +
> > > + /* Load QEMU_VM_SECTION_FULL section */
> > > + ret = qemu_loadvm_state_main(f, mis);
> > > + if (ret < 0) {
> > > + error_report("Failed to load device state: %d", ret);
> > > + return ret;
> > > + }
> > > +
> > > + cpu_synchronize_all_post_init();
> > > + return 0;
> > > +}
> > > +
> > > int save_snapshot(const char *name, Error **errp)
> > > {
> > > BlockDriverState *bs, *bs1;
> > > diff --git a/migration/savevm.h b/migration/savevm.h
> > > index c6d46b37a2..cf7935dd68 100644
> > > --- a/migration/savevm.h
> > > +++ b/migration/savevm.h
> > > @@ -53,8 +53,12 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile
> > *f, const char *name,
> > > uint64_t *start_list,
> > > uint64_t *length_list);
> > > void qemu_savevm_send_colo_enable(QEMUFile *f);
> > > +void qemu_savevm_live_state(QEMUFile *f);
> > > +int qemu_save_device_state(QEMUFile *f);
> > >
> > > int qemu_loadvm_state(QEMUFile *f);
> > > void qemu_loadvm_state_cleanup(void);
> > > +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
> > > +int qemu_load_device_state(QEMUFile *f);
> > >
> > > #endif
> > > --
> > > 2.17.0
> >
> > Dave
> >
> > >
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> >
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK