qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration


From: Dr. David Alan Gilbert
Subject: Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
Date: Tue, 15 Mar 2022 16:14:52 +0000
User-agent: Mutt/2.1.5 (2021-12-30)

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Tue, 15 Mar 2022 at 14:39, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Mon, 14 Mar 2022 at 19:44, Peter Maydell <peter.maydell@linaro.org> 
> > wrote:
> > > On Mon, 14 Mar 2022 at 18:58, Peter Maydell <peter.maydell@linaro.org> 
> > > wrote:
> > > > I just hit the abort case, narrowing it down to the
> > > > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > > > any other tests being run:
> > >
> > > > This test seems to fail fairly frequently. I'll try a bisect...
> > >
> > > On this s390 machine, this test has been intermittent since
> > > it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> > > multifd support") in 2019.
> >
> > I have tried (on current master) runs of various of the other
> > migration tests, and:
> >  * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
> >    failing
> >  * /i386/migration/precopy/tcp completed 4669 iterations without
> >    failing
> >  * /i386/migration/multifd/tcp/zlib fails usually within the first
> >    10 iterations (the most I ever saw it manage was 32)
> >
> > So whatever this is, it seems like it might be specific to the
> > zlib code somehow ?
> 
> Maybe we're running into this bug
> https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> ("zlib: compressBound() returns an incorrect result on z15") ?

The initial description of compressBound being wrong doesn't
feel like it would cause that; it claims it would trigger an error
(I'm not sure how good we are at spotting that!); but then later
in the description it says:

'Mistakes in dfltcc_free_window OF and especially DEFLATE_BOUND_COMPLEN,
  (incl. the bit definitions), may cause various and unforseen defects'

Certainly looks like a 'various and unforseen defect'.

Dave

> That bug report claims it doesn't affect focal, though, which
> is what we're running on this box (specifically, the zlib1g
> package is version 1:1.2.11.dfsg-2ubuntu1.2).
> 
> A run with DFLTCC=0 has made it past 60 iterations so far, which
> suggests that that does serve as a workaround for the bug.
> 
> thanks
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]