[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] multifd: Copy pages before compressing them with zlib
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH] multifd: Copy pages before compressing them with zlib |
Date: |
Mon, 4 Apr 2022 18:11:44 +0100 |
User-agent: |
Mutt/2.1.5 (2021-12-30) |
* Ilya Leoshkevich (iii@linux.ibm.com) wrote:
> On Mon, 2022-04-04 at 12:20 +0100, Dr. David Alan Gilbert wrote:
> > * Ilya Leoshkevich (iii@linux.ibm.com) wrote:
> > > zlib_send_prepare() compresses pages of a running VM. zlib does not
> > > make any thread-safety guarantees with respect to changing
> > > deflate()
> > > input concurrently with deflate() [1].
> > >
> > > One can observe problems due to this with the IBM zEnterprise Data
> > > Compression accelerator capable zlib [2]. When the hardware
> > > acceleration is enabled, migration/multifd/tcp/zlib test fails
> > > intermittently [3] due to sliding window corruption.
> > >
> > > At the moment this problem occurs only with this accelerator, since
> > > its architecture explicitly discourages concurrent accesses [4]:
> > >
> > > Page 26-57, "Other Conditions":
> > >
> > > As observed by this CPU, other CPUs, and channel
> > > programs, references to the parameter block, first,
> > > second, and third operands may be multiple-access
> > > references, accesses to these storage locations are
> > > not necessarily block-concurrent, and the sequence
> > > of these accesses or references is undefined.
> > >
> > > Still, it might affect other platforms due to a future zlib update.
> > > Therefore, copy the page being compressed into a private buffer
> > > before
> > > passing it to zlib.
> >
> > While this might work around the problem; your explanation doesn't
> > quite
> > fit with the symptoms; or if they do, then you have a separate
> > problem.
> >
> > The live migration code relies on the fact that the source is running
> > and changing it's memory as the data is transmitted; however it also
> > relies on the fact that if this happens the 'dirty' flag is set
> > _after_
> > those changes causing another round of migration and retransmission
> > of
> > the (now stable) data.
> >
> > We don't expect the load of the data for the first page write to be
> > correct, consistent etc - we just rely on the retransmission to be
> > correct when the page is stable.
> >
> > If your compressor hardware is doing something undefined during the
> > first case that's fine; as long as it works fine in the stable case
> > where the data isn't changing.
> >
> > Adding the extra copy is going to slow everyone else dowmn; and since
> > there's plenty of pthread lockingin those multifd I'm expecting them
> > to get reasonably defined ordering and thus be safe from multi
> > threading
> > problems (please correct us if we've actually done something wrong in
> > the locking there).
> >
> > IMHO your accelerator when called from a zlib call needs to behave
> > the same as if it was the software implementation; i.e. if we've got
> > pthread calls in there that are enforcing ordering then that should
> > be
> > fine; your accelerator implementation needs to add a barrier of some
> > type or an internal copy, not penalise everyone else.
> >
> > Dave
>
> The problem with the accelerator is that during the first case the
> internal state might end up being corrupted (in particular: what goes
> into the deflate stream differs from what goes into the sliding
> window). This may affect the data integrity in the second case later
> on.
Hmm I hadn't expected the unpredictability to span multiple blocks.
> I've been trying to think what to do with that, and of course doing an
> internal copy is one option (a barrier won't suffice). However, I
> realized that zlib API as documented doesn't guarantee that it's safe
> to change input data concurrently with compression. On the other hand,
> today's zlib is implemented in a way that tolerates this.
>
> So the open question for me is, whether we should honor zlib
> documentation (in which case, I would argue, QEMU needs to be changed)
> or say that the behavior of today's zlib implementation is more
> important (in which case accelerator code needs to change). I went with
> the former for now, but the latter is of course doable as well.
Well I think you're saying that the current docs don't specify and
thus assume that there's a constraint.
I think the right people to answer this is the zlib community; so
can you send a mail to zlib-devel and ask?
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK