qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1


From: Jamie Lokier
Subject: Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1
Date: Fri, 23 Apr 2010 16:07:41 +0100
User-agent: Mutt/1.5.13 (2006-08-11)

Yoshiaki Tamura wrote:
> Jamie Lokier wrote:
> >Yoshiaki Tamura wrote:
> >>Dor Laor wrote:
> >>>On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
> >>>>Event tapping is the core component of Kemari, and it decides on which
> >>>>event the
> >>>>primary should synchronize with the secondary. The basic assumption
> >>>>here is
> >>>>that outgoing I/O operations are idempotent, which is usually true for
> >>>>disk I/O
> >>>>and reliable network protocols such as TCP.
> >>>
> >>>IMO any type of network even should be stalled too. What if the VM runs
> >>>non tcp protocol and the packet that the master node sent reached some
> >>>remote client and before the sync to the slave the master failed?
> >>
> >>In current implementation, it is actually stalling any type of network
> >>that goes through virtio-net.
> >>
> >>However, if the application was using unreliable protocols, it should have
> >>its own recovering mechanism, or it should be completely stateless.
> >
> >Even with unreliable protocols, if slave takeover causes the receiver
> >to have received a packet that the sender _does not think it has ever
> >sent_, expect some protocols to break.
> >
> >If the slave replaying master's behaviour since the last sync means it
> >will definitely get into the same state of having sent the packet,
> >that works out.
> 
> That's something we're expecting now.
> 
> >But you still have to be careful that the other end's responses to
> >that packet are not seen by the slave too early during that replay.
> >Otherwise, for example, the slave may observe a TCP ACK to a packet
> >that it hasn't yet sent, which is an error.
> 
> Even current implementation syncs just before network output, what you 
> pointed out could happen.  In this case, would the connection going to be 
> lost, or would client/server recover from it?  If latter, it would be fine, 
> otherwise I wonder how people doing similar things are handling this 
> situation.

In the case of TCP in a "synchronised state", I think it will recover
according to the rules in RFC793.  In an "unsynchronised state"
(during connection), I'm not sure if it recovers or if it looks like a
"Connection reset" error.  I suspect it does recover but I'm not certain.

But that's TCP.  Other protocols, such as over UDP, may behave
differently, because this is not an anticipated behaviour of a
network.

> >However there is one respect in which they're not idempotent:
> >
> >The TTL field should be decreased if packets are delayed.  Packets
> >should not appear to live in the network for longer than TTL seconds.
> >If they do, some protocols (like TCP) can react to the delayed ones
> >differently, such as sending a RST packet and breaking a connection.
> >
> >It is acceptable to reduce TTL faster than the minimum.  After all, it
> >is reduced by 1 on every forwarding hop, in addition to time delays.
> 
> So the problem is, when the slave takes over, it sends a packet with same 
> TTL which client may have received.

Yes.  I guess this is a general problem with time-based protocols and
virtual machines getting stopped for 1 minute (say), without knowing
that real time has moved on for the other nodes.

Some application transaction, caching and locking protocols will give
wrong results when their time assumptions are discontinuous to such a
large degree.  It's a bit nasty to impose that on them after they
worked so hard on their reliability :-)

However, I think such implementations _could_ be made safe if those
programs can arrange to definitely be interrupted with a signal when
the discontinuity happens.  Of course, only if they're aware they may
be running on a Kemari system...

I have an intuitive idea that there is a solution to that, but each
time I try to write the next paragraph explaining it, some little
complication crops up and it needs more thought.  Something about
concurrent, asynchronous transactions to keep the master running while
recording the minimum states that replay needs to be safe, while
slewing the replaying slave's virtual clock back to real time quickly
during recovery mode.

-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]