[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC] COLO HA Project proposal
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [RFC] COLO HA Project proposal |
Date: |
Fri, 4 Jul 2014 09:35:46 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
* Dong, Eddie (address@hidden) wrote:
> > >
> > > I didn't quite understand a couple of things though, perhaps you can
> > > explain:
> > > 1) If we ignore the TCP sequence number problem, in an SMP machine
> > > don't we get other randomnesses - e.g. which core completes something
> > > first, or who wins a lock contention, so the output stream might not
> > > be identical - so do those normal bits of randomness cause the
> > > machines to flag as out-of-sync?
> >
> > It's about COLO agent, CCing Congyang, he can give the detailed
> > explanation.
> >
>
> Let me clarify on this issue. COLO didn't ignore the TCP sequence number, but
> uses a
> new implementation to make the sequence number to be best effort identical
> between the primary VM (PVM) and secondary VM (SVM). Likely, VMM has to
> synchronize
> the emulation of randomization number generation mechanism between the
> PVM and SVM, like the lock-stepping mechanism does.
>
> Further mnore, for long TCP connection, we can rely on the (on-demand) VM
> checkpoint to get the
> identical Sequence number both in PVM and SVM.
That wasn't really my question; I was worrying about other forms of randomness,
such as winners of lock contention, and other SMP non-determinisms,
and I'm also worried by what proportion of time the system can't recover
from a failure due to being unable to distinguish an SVM failure from
a randomness issue.
Dave
>
>
> Thanks, Eddie
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
Re: [Qemu-devel] [RFC] COLO HA Project proposal, Michael R. Hines, 2014/07/09
Re: [Qemu-devel] [RFC] COLO HA Project proposal, Andreas Färber, 2014/07/04