[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] When does live migration give up?

From: Paolo Bonzini
Subject: Re: [Qemu-devel] When does live migration give up?
Date: Wed, 04 Sep 2013 20:34:27 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8

Il 04/09/2013 20:05, Alex Bligh ha scritto:
> Paolo,
> --On 4 September 2013 19:07:53 +0200 Paolo Bonzini <address@hidden>
> wrote:
>> Il 04/09/2013 17:24, Alex Bligh ha scritto:
>>> We have seen a situation when migrating about 50 VMs at once where some
>>> of them fail. I think this is because they are dirtying pages faster
>>> than
>>> they can be transmitted.
>> No, migration never "gives up".  It may never converge, but it keeps
>> trying until cancelled.
>> Could it be that you are choosing migration server ports from a small
>> range, and some of them are failing because two migrations pick the same
>> random port for the destination (which is where the server socket lies)?
> Should not be that. We create FDs (which are sockets) and pass them in at
> both ends.

Do you mean something like this?

      bind() to { sin_port = 0, sin_addr.s_addr = INADDR_ANY }
      send address to source
      start QEMU with file descriptor returned by accept

      read address
      pass socket file descriptor to QEMU and migrate to it

Anything that doesn't use sin_port = 0 and getsockname() is prone to
race conditions.

> Approx 10% of migrations die after many minutes on the
> customer's platform. This does not appear to happen if migrations are
> not carried out 50 at a time.

Dying after many minutes usually means that the destination is not set
up the same as the source, as you said below.


> We appear to be getting something other than 'ms' returned through the
> monitoring system. Unhelpfully what that is is not logged.
> Is there anything (apart from the socket closing prematurely) which can
> cause a failed migration after many minutes? We've seen problems where
> the destination is not set up the same as the source (e.g. different
> numbers of NICs) but IIRC that fails much earlier.
> To make things easier (cough), this is qemu 1.0 (as shipped with Ubuntu
> Precise).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]