[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Abnormal observation during migration: too many "write-
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages |
Date: |
Wed, 15 Nov 2017 10:11:37 +0000 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
* Chunguang Li (address@hidden) wrote:
> Hi all!
>
> I got a very abnormal observation for the VM migration. I found that many
> pages marked as dirty during migration are "not really dirty", which is,
> their content are the same as the old version.
>
>
>
>
> I did the migration experiment like this:
>
> During the setup phase of migration, first I suspended the VM. Then I copied
> all the pages within the guest physical address space to a memory buffer as
> large as the guest memory size. After that, the dirty tracking began and I
> resumed the VM. Besides, at the end
> of each iteration, I also suspended the VM temporarily. During the
> suspension, I compared the content of all the pages marked as dirty in this
> iteration byte-by-byte with their former copies inside the buffer. If the
> content of one page was the same as its former copy, I recorded it as a
> "write-not-dirty" page (the page is written exactly with the same content as
> the old version). Otherwise, I replaced this page in the buffer with the new
> content, for the possible comparison in the future. After the reset of the
> dirty bitmap, I resumed the VM. Thus, I obtain the proportion of the
> write-not-dirty pages within all the pages marked as dirty for each pre-copy
> iteration.
>
> I repeated this experiment with 15 workloads, which are 11 CPU2006
> benchmarks, Memcached server, kernel compilation, playing a video, and an
> idle VM. The CPU2006 benchmarks and Memcached are write-intensive workloads.
> So almost all of them did not converge to stop-copy.
>
>
>
>
> Startlingly, the proportions of the write-not-dirty pages are quite high.
> Memcached and three CPU2006 benchmarks(zeusmp, mcf and bzip2) have the most
> high proportions. Their proportions of the write-not-dirty pages within all
> the dirty pages are as high as 45%-80%. The proportions of the other
> workloads are about 5%-20%, which are also abnormal. According to my
> intuition, the proportion of write-not-dirty pages should be far less than
> these numbers. I think it should be quite a particular case that one page is
> written with exactly the same content as the former data.
>
> Besides, the zero pages are not counted for all the results. Because I think
> codes like memset() may write large area of pages to zero pages, which are
> already zero pages before.
>
>
>
>
> I excluded some possible unknown reasons with the machine hardware, because I
> repeated the experiments with two sets of different machines. Then I guessed
> it might be related with the huge page feature. However, the result was the
> same when I turned the huge page feature off in the OS.
>
>
>
>
> Now there are only two possible reasons in my opinion.
>
> First, there is some bugs in the KVM kernel dirty tracking mechanism. It may
> mark some pages that do not receive write request as dirty.
>
> Second, there is some bugs in the OS running inside the VM. It may issue some
> unnecessary write requests.
>
>
> What do you think about this abnormal phenomenon? Any advice or possible
> reasons or even guesses? I appreciate any responses, because it has confused
> me for a long time. Thank you.
Wasn't it you who pointed out last year the other possibility? - The
problem of false positives due to sync'ing the whole of memory and then
writing the data out, but some of the dirty pages were already written?
Dave
>
> --
> Chunguang Li, Ph.D. Candidate
> Wuhan National Laboratory for Optoelectronics (WNLO)
> Huazhong University of Science & Technology (HUST)
> Wuhan, Hubei Prov., China
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK