[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] dirty page count problem
From: |
Dr. David Alan Gilbert |
Subject: |
[Qemu-devel] dirty page count problem |
Date: |
Fri, 21 Jul 2017 18:28:33 +0100 |
User-agent: |
Mutt/1.8.3 (2017-05-23) |
Hi,
Git bisect is pointing to your patch 084140bd49:
exec: fix access to ram_list.dirty_memory when sync dirty bitmap
trying to diagnose a bug I'm seeing; it looks like the dirty page count
is wrong for some reason.
Alex Bennée spotted a problem where the postcopy test would occasionally
fail under very heavy load; attaching a debugger and it looks like
the problem is we have a migration_dirty_page count stuck at 2;
in the normal migration tests we don't spot this, because 2 pages is
smaller than the threshold to end migration and so an extra 2 pages
doesn't block it finishing. However, with a very
small downtime setting (like we use in the postcopy test) and with
very low bandwidth (as when Alex ran the test on a very heavily loaded
machine) we end up never calling the bitmap sync again and never
completing the iteration.
I'm using the following addition to spot the problem:
diff --git a/migration/ram.c b/migration/ram.c
index e75f1050e4..3ddf884952 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1350,6 +1350,13 @@ static int ram_find_and_save_block(RAMState *rs, bool
last_stage)
}
} while (!pages && again);
+ if (!pages && !again && pss.complete_round && rs->migration_dirty_pages)
+ {
+ /* Should make this fail migration ? */
+ fprintf(stderr, "%s: no page found, yet dirty_pages=%"PRIu64"\n",
+ __func__, rs->migration_dirty_pages);
+ }
+
rs->last_seen_block = pss.block;
rs->last_page = pss.page;
(which I might add as a test to fail a migration)
That test fails easily even on an unloaded machine:
tests/postcopy-test
/x86_64/postcopy: ram_find_and_save_block: no page found, yet dirty_pages=2
ram_find_and_save_block: no page found, yet dirty_pages=2
ram_find_and_save_block: no page found, yet dirty_pages=2
OK
I'll try and debug where our extra two pages are coming from.
Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] dirty page count problem,
Dr. David Alan Gilbert <=