[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH COLO-Frame v11 25/39] COLO: implement default fa
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH COLO-Frame v11 25/39] COLO: implement default failover treatment |
Date: |
Thu, 10 Dec 2015 19:01:14 +0000 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
* zhanghailiang (address@hidden) wrote:
> If we detect some error in colo, we will wait for some time,
> hoping users also detect it. If users don't issue failover command.
> We will go into default failover procedure, which the PVM will takeover
> work while SVM is exit in default.
I'm not sure this is needed; especially on the SVM. I don't see any harm
in the SVM waiting forever to be told what to do - it could be told to
failover or quit; I don't see any benefit to it automatically exiting.
In the primary, I can see if you didn't have some automated error
detection system then I can understand it (but I think it's rare);
but you really would want to make that failover delay configurable
so that you could turn it off in a system that did have failure detection;
because automatically restarting the primary after it had caused a failover
to the secondary would be very bad.
Dave
>
> Signed-off-by: zhanghailiang <address@hidden>
> Signed-off-by: Li Zhijian <address@hidden>
> ---
> migration/colo.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index f31e957..1e6d3dd 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -19,6 +19,14 @@
> #include "qemu/sockets.h"
> #include "migration/failover.h"
>
> +/*
> + * The delay time before qemu begin the procedure of default failover
> treatment.
> + * Unit: ms
> + * Fix me: This value should be able to change by command
> + * 'migrate-set-parameters'
> + */
> +#define DEFAULT_FAILOVER_DELAY 2000
> +
> /* colo buffer */
> #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>
> @@ -264,6 +272,7 @@ static void colo_process_checkpoint(MigrationState *s)
> {
> QEMUSizedBuffer *buffer = NULL;
> int64_t current_time, checkpoint_time =
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
> + int64_t error_time;
> int ret = 0;
> uint64_t value;
>
> @@ -322,8 +331,25 @@ static void colo_process_checkpoint(MigrationState *s)
> }
>
> out:
> + current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> if (ret < 0) {
> error_report("%s: %s", __func__, strerror(-ret));
> + /* Give users time to get involved in this verdict */
> + while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
> + if (failover_request_is_active()) {
> + error_report("Primary VM will take over work");
> + break;
> + }
> + usleep(100 * 1000);
> + current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> + }
> +
> + qemu_mutex_lock_iothread();
> + if (!failover_request_is_active()) {
> + error_report("Primary VM will take over work in default");
> + failover_request_active(NULL);
> + }
> + qemu_mutex_unlock_iothread();
> }
>
> qsb_free(buffer);
> @@ -384,6 +410,7 @@ void *colo_process_incoming_thread(void *opaque)
> QEMUFile *fb = NULL;
> QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */
> uint64_t total_size;
> + int64_t error_time, current_time;
> int ret = 0;
> uint64_t value;
>
> @@ -499,9 +526,28 @@ void *colo_process_incoming_thread(void *opaque)
> }
>
> out:
> + current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> if (ret < 0) {
> error_report("colo incoming thread will exit, detect error: %s",
> strerror(-ret));
> + /* Give users time to get involved in this verdict */
> + while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
> + if (failover_request_is_active()) {
> + error_report("Secondary VM will take over work");
> + break;
> + }
> + usleep(100 * 1000);
> + current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> + }
> + /* check flag again*/
> + if (!failover_request_is_active()) {
> + /*
> + * We assume that Primary VM is still alive according to
> + * heartbeat, just kill Secondary VM
> + */
> + error_report("SVM is going to exit in default!");
> + exit(1);
> + }
> }
>
> if (fb) {
> --
> 1.8.3.1
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] [PATCH COLO-Frame v11 25/39] COLO: implement default failover treatment,
Dr. David Alan Gilbert <=