[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unp
Re: [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug()
Thu, 16 Jun 2022 17:27:48 -0400
Thank you for finding this and fixing it. This issue has been giving us grief for months, and this patch appears to resolve the problem.
In our case, it seemed to have much greater severity with the RHEL / CentOS 7.x Linux 3.10 kernel when tied to SolidFire iSCSI based storage. This caused it to escape notice in our original soak period, and is likely a contributor to why others didn't encounter the problem. However, I believe this looks like a serious problem that could affect any guest machine that does a large amount of I/O. I believe the SolidFire connection may be that the I/O can queue up more easily than the local NVMe storage we also use, and there could be something related to the SolidFire QoS re-balancing where the iSCSI connection may be re-negotiated from time to time. So, I think this is more like "happens in some environments more than others", and unfortunately it happened a lot in one of our environments. :-(
On Thu, Jun 09, 2022 at 05:47:10PM +0100, Stefan Hajnoczi wrote:
> An unlucky I/O pattern can result in stalled Linux AIO requests when the
> plugged counter becomes unbalanced. See Patch 1 for details.
> Patch 2 adds a comment to explain why the laio_io_unplug() even checks max
> batch in the first place.
> Stefan Hajnoczi (2):
> linux-aio: fix unbalanced plugged counter in laio_io_unplug()
> linux-aio: explain why max batch is checked in laio_io_unplug()
> block/linux-aio.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
Thanks, applied to my block tree: