qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] broken incoming migration


From: Wenchao Xia
Subject: Re: [Qemu-devel] broken incoming migration
Date: Sun, 09 Jun 2013 11:01:14 +0800
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20130509 Thunderbird/17.0.6

于 2013-6-9 10:34, Alexey Kardashevskiy 写道:
On 06/09/2013 12:16 PM, Wenchao Xia wrote:
于 2013-6-8 16:30, Alexey Kardashevskiy 写道:
On 06/08/2013 06:27 PM, Wenchao Xia wrote:
On 04.06.2013 16:40, Paolo Bonzini wrote:
Il 04/06/2013 16:38, Peter Lieven ha scritto:
On 04.06.2013 16:14, Paolo Bonzini wrote:
Il 04/06/2013 15:52, Peter Lieven ha scritto:
On 30.05.2013 16:41, Paolo Bonzini wrote:
Il 30/05/2013 16:38, Peter Lieven ha scritto:
You could also scan the page for nonzero values before writing it.
i had this in mind, but then choosed the other approach.... turned
out to be a bad idea.

alexey: i will prepare a patch later today, could you then please
verify it fixes your problem.

paolo: would we still need the madvise or is it enough to not write
the zeroes?
It should be enough to not write them.
Problem: checking the pages for zero allocates them. even at the
source.
It doesn't look like.  I tried this program and top doesn't show an
increasing amount of reserved memory:

#include <stdio.h>
#include <stdlib.h>
int main()
{
        char *x = malloc(500 << 20);
        int i, j;
        for (i = 0; i < 500; i += 10) {
            for (j = 0; j < 10 << 20; j += 4096) {
                 *(volatile char*) (x + (i << 20) + j);
            }
            getchar();
        }
}
strange. we are talking about RSS size, right?
None of the three top values change, and only VIRT is >500 MB.

is the malloc above using mmapped memory?
Yes.

which kernel version do you use?
3.9.

what avoids allocating the memory for me is the following (with
whatever side effects it has ;-))
This would also fail to migrate any page that is swapped out, breaking
overcommit in a more subtle way. :)

Paolo
the following does also not allocate memory, but qemu does...

Hi, Peter
    As the patch writes

"not sending zero pages breaks migration if a page is zero
at the source but not at the destination."

    I don't understand why it would be trouble, shouldn't all page
not received in dest be treated as zero pages?


How would the destination guest know if some page must be cleared? The
previous patch (which Peter reverted) did not send anything for the pages
which were zero on the source side.


   If an page was not received and destination knows that page should
exist according to total size, fill it with zero at destination, would
it solve the problem?

It is _live_ migration, the source sends changes, same pages can change and
be sent several times. So we would need to turn tracking on on the
destination to know if some page was received from the source or changed by
the destination itself (by writing there bios/firmware images, etc) and
then clear pages which were touched by the destination and were not sent by
the source.
  OK, I can understand the problem is, for example:
Destination boots up with 0x0000-0xFFFF filled with bios image.
Source forgot to send zero pages in 0x0000-0xFFFF.
After migration destination got 0x0000-0xFFFF dirty(different with
source)
  Thanks for explain.

  This seems refer to the migration protocol: how should the guest treat
unsent pages. The patch causing the problem, actually treat zero pages
as "not to sent" at source, but another half is missing: treat "not
received" as zero pages at destination. I guess if second half is added,
problem is gone:
after page transfer completed, before destination resume,
fill zero in "not received" pages.


Or we do not make guesses, the source sends everything and the destination
simply checks if a page which is empty on the source is empty on the
destination and avoid writing zeroes to it. Looks simpler to me and this is
what the new patch does.




Also, you mean following code is from qemu and it does not allocate
memory with you gcc right? Maybe it is related to KVM, how about
turn off KVM and retry following code in qemu?

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <sys/resource.h>
#include <inttypes.h>
#include <string.h>
#include <sys/mman.h>
#include <errno.h>

#if defined __SSE2__
#include <emmintrin.h>
#define VECTYPE        __m128i
#define SPLAT(p)       _mm_set1_epi8(*(p))
#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) ==
0xFFFF)
#else
#define VECTYPE        unsigned long
#define SPLAT(p)       (*(p) * (~0UL / 255))
#define ALL_EQ(v1, v2) ((v1) == (v2))
#endif

#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8

/* Round number down to multiple */
#define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))

/* Round number up to multiple */
#define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))

#define QEMU_VMALLOC_ALIGN (256 * 4096)

/* alloc shared memory pages */
void *qemu_anon_ram_alloc(size_t size)
{
       size_t align = QEMU_VMALLOC_ALIGN;
       size_t total = size + align - getpagesize();
       void *ptr = mmap(0, total, PROT_READ | PROT_WRITE,
                        MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
       size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) -
(uintptr_t)ptr;

       if (ptr == MAP_FAILED) {
           fprintf(stderr, "Failed to allocate %zu B: %s\n",
                   size, strerror(errno));
           abort();
       }

       ptr += offset;
       total -= offset;

       if (offset > 0) {
           munmap(ptr - offset, offset);
       }
       if (total > size) {
           munmap(ptr + size, total - size);
       }

       return ptr;
}

static inline int
can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
{
       return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
                      * sizeof(VECTYPE)) == 0
               && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
}

size_t buffer_find_nonzero_offset(const void *buf, size_t len)
{
       const VECTYPE *p = buf;
       const VECTYPE zero = (VECTYPE){0};
       size_t i;

       if (!len) {
           return 0;
       }

       assert(can_use_buffer_find_nonzero_offset(buf, len));

       for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
           if (!ALL_EQ(p[i], zero)) {
               return i * sizeof(VECTYPE);
           }
       }

       for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
            i < len / sizeof(VECTYPE);
            i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
           VECTYPE tmp0 = p[i + 0] | p[i + 1];
           VECTYPE tmp1 = p[i + 2] | p[i + 3];
           VECTYPE tmp2 = p[i + 4] | p[i + 5];
           VECTYPE tmp3 = p[i + 6] | p[i + 7];
           VECTYPE tmp01 = tmp0 | tmp1;
           VECTYPE tmp23 = tmp2 | tmp3;
           if (!ALL_EQ(tmp01 | tmp23, zero)) {
               break;
           }
       }

       return i * sizeof(VECTYPE);
}

int main()
{
        //char *x = malloc(1024 << 20);
        char *x = qemu_anon_ram_alloc(1024 << 20);

        int i, j;
        int ret = 0;
        struct rusage rusage;
        for (i = 0; i < 500; i ++) {
            for (j = 0; j < 10 << 20; j += 4096) {
                 ret += buffer_find_nonzero_offset((char*) (x + (i << 20)
+ j), 4096);
            }
            getrusage( RUSAGE_SELF, &rusage );
            printf("read offset: %d kB, RSS size: %ld kB", ((i+1) << 10),
rusage.ru_maxrss);
            getchar();
        }
        printf("%d zero pages\n", ret);
}











--
Best Regards

Wenchao Xia




reply via email to

[Prev in Thread] Current Thread [Next in Thread]