[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TC
From: |
Nicholas Piggin |
Subject: |
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access |
Date: |
Fri, 23 Feb 2024 11:10:23 +1000 |
On Fri Feb 23, 2024 at 6:59 AM AEST, Thomas Huth wrote:
> On 20/02/2024 02.13, Nicholas Piggin wrote:
> > On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
> >> On 19/02/2024 07.17, Nicholas Piggin wrote:
> >>> The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> >>> aligned ranges forgot to bring the TCG TLB up to date after clearing
> >>> some of the dirty memory bitmap bits. This can result in stores though
> >>> the TCG TLB not setting the dirty memory bitmap and ultimately causes
> >>> memory corruption / lost updates during migration from a TCG host.
> >>>
> >>> Fix this by exporting an abstracted function to call when dirty bits
> >>> have been cleared.
> >>>
> >>> Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a
> >>> time")
> >>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>> ---
> >>
> >> Sounds promising! ... but it doesn't seem to fix the migration-test qtest
> >> with s390x when it gets enabled again:
> >
> > Did it fix kvm-unit-tests for you?
>
> It does, indeed! With your QEMU patch here, your new selftest-migration test
> of the k-u-t is working reliably with TCG now, indeed. Thus feel free to add:
>
> Tested-by: Thomas Huth <thuth@redhat.com>
Great, thanks.
>
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
> >> return g_test_run();
> >> }
> >>
> >> - /*
> >> - * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
> >> - * there until the problems are resolved
> >> - */
> >> - if (g_str_equal(arch, "s390x") && !has_kvm) {
> >> - g_test_message("Skipping test: s390x host with KVM is required");
> >> - return g_test_run();
> >> - }
> >> -
> >> tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
> >> if (!tmpfs) {
> >> g_test_message("Can't create temporary directory in %s: %s",
> >>
> >> I wonder whether there is more stuff like this necessary somewhere?
> >
> > Possibly. That's what the commit logs for the TCG disable indicate. I
> > have found another dirty bitmap TCG race too. I'll send it out after
> > some more testing.
> >
> >> Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to
> >> see whether that works fine now?
> >
> > Hmm, I did try and so far ppc64 is not failing even with upstream QEMU.
>
> Oh, indeed! Actually, now that you mentioned it, I remembered that I checked
> it a couple of weeks ago already:
>
> https://lore.kernel.org/qemu-devel/7d4f5624-83d2-4330-9315-b23869529e99@redhat.com/
Okay I'll look at re-enabling it then.
> > I'll try with s390x. Any additional build or runtime options to make it
> > break? How long does it take for breakage to be evident?
>
> For me, it normally breaks after running the migration test a couple of few
> times already, let's say one time out of ten runs?
Seems like a tricky one to debug.
It looks like the migration qtest is just migrating while incrementing each
char in 99MB of memory? Interesting if that breaks but k-u-t multi
migration on s390x does not. Could be worth looking at the differences
between them.
It is also odd the qtest didn't trigger this TCG bug. I have another
multi-migration test for kvm-unit-tests (not yet submitted) which does
similar dirtying of memory and that *does* break TCG.
Thanks,
Nick