[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] multifd: Copy pages before compressing them with zlib
From: |
Juan Quintela |
Subject: |
Re: [PATCH] multifd: Copy pages before compressing them with zlib |
Date: |
Mon, 04 Apr 2022 15:55:22 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Apr 04, 2022 at 12:20:14PM +0100, Dr. David Alan Gilbert wrote:
>> * Ilya Leoshkevich (iii@linux.ibm.com) wrote:
>> > zlib_send_prepare() compresses pages of a running VM. zlib does not
>> > make any thread-safety guarantees with respect to changing deflate()
>> > input concurrently with deflate() [1].
>> >
>> > One can observe problems due to this with the IBM zEnterprise Data
>> > Compression accelerator capable zlib [2]. When the hardware
>> > acceleration is enabled, migration/multifd/tcp/zlib test fails
>> > intermittently [3] due to sliding window corruption.
>> >
>> > At the moment this problem occurs only with this accelerator, since
>> > its architecture explicitly discourages concurrent accesses [4]:
>> >
>> > Page 26-57, "Other Conditions":
>> >
>> > As observed by this CPU, other CPUs, and channel
>> > programs, references to the parameter block, first,
>> > second, and third operands may be multiple-access
>> > references, accesses to these storage locations are
>> > not necessarily block-concurrent, and the sequence
>> > of these accesses or references is undefined.
>> >
>> > Still, it might affect other platforms due to a future zlib update.
>> > Therefore, copy the page being compressed into a private buffer before
>> > passing it to zlib.
>>
>> While this might work around the problem; your explanation doesn't quite
>> fit with the symptoms; or if they do, then you have a separate problem.
>>
>> The live migration code relies on the fact that the source is running
>> and changing it's memory as the data is transmitted; however it also
>> relies on the fact that if this happens the 'dirty' flag is set _after_
>> those changes causing another round of migration and retransmission of
>> the (now stable) data.
>>
>> We don't expect the load of the data for the first page write to be
>> correct, consistent etc - we just rely on the retransmission to be
>> correct when the page is stable.
>>
>> If your compressor hardware is doing something undefined during the
>> first case that's fine; as long as it works fine in the stable case
>> where the data isn't changing.
>>
>> Adding the extra copy is going to slow everyone else dowmn; and since
>> there's plenty of pthread lockingin those multifd I'm expecting them
>> to get reasonably defined ordering and thus be safe from multi threading
>> problems (please correct us if we've actually done something wrong in
>> the locking there).
>>
>> IMHO your accelerator when called from a zlib call needs to behave
>> the same as if it was the software implementation; i.e. if we've got
>> pthread calls in there that are enforcing ordering then that should be
>> fine; your accelerator implementation needs to add a barrier of some
>> type or an internal copy, not penalise everyone else.
>
> It is reasonable to argue that QEMU is relying on undefined behaviour
> when invoking zlib in this case, so it isn't clear that the accelerator
> impl should be changed, rather than QEMU be changed to follow the zlib
> API requirements.
It works on all the other cases. My vote if need taht is that we add a
zlib-sync or similar method.
zlib already means doing a copy, doing an extra copy will cost too much
on my opinion.
Once that we are here, is there such a requirement for zstd? In my
testing, zstd was basically always better than zlib (no, I don't
remember the details).
Later, Juan.