qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Question on memory commit during MR finalize()


From: Thanos Makatos
Subject: RE: Question on memory commit during MR finalize()
Date: Thu, 15 Jul 2021 14:27:48 +0000

Hi Peter,

> -----Original Message-----
> From: Qemu-devel <qemu-devel-
> bounces+thanos.makatos=nutanix.com@nongnu.org> On Behalf Of Peter
> Xu
> Sent: 21 April 2020 11:44
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>; QEMU Devel Mailing List
> <qemu-devel@nongnu.org>
> Subject: Re: Question on memory commit during MR finalize()
> 
> On Tue, Apr 21, 2020 at 11:43:36AM +0200, Paolo Bonzini wrote:
> > On 21/04/20 01:31, Peter Xu wrote:
> > >>
> > >> However, instead of memory_region_transaction_commit,
> > >> memory_region_finalize probably should do something like
> > >>
> > >>     --memory_region_transaction_depth;
> > >>     assert (memory_region_transaction_depth ||
> > >>      (!memory_region_update_pending &&
> > >>              !ioeventfd_update_pending));
> > > Ah I see; this makes sense.
> > >
> > > And finally I found the problem, which is indeed the bug in my own
> > > tree - I forgot to remove the previous changes to flush the dirty
> > > ring during mem removal (basically that's run_on_cpu() called during
> > > a memory commit, that will wrongly release the BQL without being
> noticed).
> > >
> > > Besides above assert, I'm thinking maybe we can also assert on
> something like:
> > >
> > >   !(memory_region_transaction_depth ||
> memory_region_update_pending ||
> > >     ioeventfd_update_pending)
> > >
> > > When releasing BQL (unlock, or qemu_cond_wait() on BQL, which should
> > > cover run_on_cpu()), so that we can identify misuse of BQL easier like
> this.
> >
> > Asserting invariants around lock release are an interesting concept,
> > but I'm not sure where to insert them exactly.  But it would be great
> > if you would like to introduce an assert_empty_memory_transaction()
> > function with the assertion I quoted above.
> 
> Let me give it a shot later today. :)

We're hitting this issue using a QEMU branch where JJ is using vfio-user as the 
transport for multiprocess-qemu (https://github.com/oracle/qemu/issues/9). We 
can reproduce it fairly reliably by migrating a virtual SPDK NVMe controller 
(the NVMf/vfio-user target with experimental migration support, 
https://review.spdk.io/gerrit/c/spdk/spdk/+/7617/14). I can provide detailed 
repro instructions but first I want to make sure we're not missing any patches.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]