On Fri, Jan 29, 2021 at 08:51:39AM +0300, Vladimir Sementsov-Ogievskiy wrote:
28.01.2021 23:14, Roman Kagan wrote:
During the final phase of migration the NBD reconnection logic may
encounter situations it doesn't expect during regular operation.
This series addresses some of them that make qemu crash. They are
reproducible when a vm with a secondary drive attached via nbd with
non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
in the guest on that drive and is migrated (e.g. to a file), while the
nbd server is SIGKILL-ed and restarted every second.
See the individual patches for specific crash conditions and more
detailed explanations.
Roman Kagan (3):
block/nbd: only detach existing iochannel from aio_context
block/nbd: only enter connection coroutine if it's present
nbd: make nbd_read* return -EIO on error
include/block/nbd.h | 7 ++++---
block/nbd.c | 25 +++++++++++++++++--------
2 files changed, 21 insertions(+), 11 deletions(-)
Thanks a lot for fixing!
Do you have some reproducer scripts? Could you post them or may be add
an iotest?
I don't have it scripted, just ad hoc command lines. I'll look into
making up a test. Can you perhaps suggest what existing test to base
on?