qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [Qemu-commits] [COMMIT 3086844] Instead of writing


From: Anthony Liguori
Subject: Re: [Qemu-devel] Re: [Qemu-commits] [COMMIT 3086844] Instead of writing a zero page, madvise it away
Date: Mon, 22 Jun 2009 12:03:25 -0500
User-agent: Thunderbird 2.0.0.21 (X11/20090320)

Avi Kivity wrote:
On 06/22/2009 07:25 PM, Anthony Liguori wrote:
Avi Kivity wrote:
On 06/22/2009 06:51 PM, Anthony Liguori wrote:
From: Anthony Liguori<address@hidden>

Otherwise, after migration, we end up with a much larger RSS size then we
ought to have.


We have the same issue on the migration source node. I don't see a simple way to solve it, though.

I don't follow.  In this case, the issue is:

1) Start a guest with 1024, balloon down to 128MB. RSS size is now ~128MB
2) Live migrate to a different node
3) RSS on different node jumps to ~1GB

3.5) RSS on source node jumps to ~1GB, since reading the page instantiates the pte

Surely we can do better here...

For TCG, we always know when memory is dirty and we can check it atomically. So we know whether a page has changed since we knew it was last zero. We basically need a ZERO_DIRTY bit. All memory initially carries this bit and ballooning also sets the bit. During live migration, we can check the dirty bit first.

For KVM, we would have to enable dirty tracking always to keep ZERO_DIRTY up to date. Since write faults are going to happen anyway at start up, perhaps the cost of doing this wouldn't be so bad?


Right. I'd love to do madvise() on the source node as well if we fault in a page and find out it's zero, but the guest (and aio) is still running and we might drop live data. We need a madvise(MADV_DONTNEED_IFZERO), or a mincore() flag that tells us if the page exists (vs. swapped). ksm would also do this, but it is overkill for some applications.

For KVM, we could just have an KVM_IOCTL_MADVISE_IF_NOT_DIRTY, but that's a bad solution. That's more or less the desired semantics though.

--
Regards,

Anthony Liguori





reply via email to

[Prev in Thread] Current Thread [Next in Thread]