[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server af
From: |
Liu Yuan |
Subject: |
Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure |
Date: |
Thu, 25 Jul 2013 11:36:51 +0800 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Jul 24, 2013 at 11:42:49PM +0800, Liu Yuan wrote:
> On Wed, Jul 24, 2013 at 06:07:21PM +0900, MORITA Kazutaka wrote:
> > At Wed, 24 Jul 2013 16:28:30 +0800,
> > Liu Yuan wrote:
> > >
> > > On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote:
> > > > Currently, if a sheepdog server exits, all the connecting VMs need to
> > > > be restarted. This series implements a feature to reconnect the
> > > > server, and enables us to do online sheepdog upgrade and avoid
> > > > restarting VMs when sheepdog servers crash unexpectedly.
> > > >
> > >
> > > It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog
> > > cluster and then
> > >
> > > 1. did some buffered writes
> > > 2. restart sheep that this QEMU VM connected to.
> > > 3. $ sync
> > >
> > > I got following error:
> > >
> > > $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda
> > > sheepdog:test
> > > qemu-system-x86_64: failed to get the header, Resource temporarily
> > > unavailable
> > > qemu-system-x86_64: Failed to connect to socket: Connection refused
> > > qemu-system-x86_64: Failed to connect to socket: Connection refused
> > > qemu-system-x86_64: Failed to connect to socket: Connection refused
> > > qemu-system-x86_64: Failed to connect to socket: Connection refused
> > > qemu-system-x86_64: Failed to connect to socket: Connection refused
> > > ...repeat...
> > >
> > > QEMU version is master tip
> >
> > Your sheep daemon looks like unreachable from qemu. I tried the same
> > procedure, but couldn't reproduce it.
> >
> > Is the problem reproducible? Can you make sure that you can connect
> > to the sheep daemon from collie while the error message shows up?
> >
>
> Yesh. Well I try to repeat it with following process:
>
> 1. did some buffered write
> 2. kill the sheep
> 3. $ sync # at guest, now 'sync' hang for response
> 4. restart sheep
>
> After 4 'sync' still hangs until timeout with a message
> "hda:dma_timer_expiry: dma status == 0x21"
>
> Guest end up freeze.
>
> QEMU output is the same:
> qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
> qemu-system-x86_64: Failed to connect to socket: Connection refused
> qemu-system-x86_64: Failed to connect to socket: Connection refused
> qemu-system-x86_64: Failed to connect to socket: Connection refused
> qemu-system-x86_64: Failed to connect to socket: Connection refused
>
> But notice, if I did restart sheep with guest doing nothing, your patch set
> work
> like a charm.
I have debug it a bit. The problem is that at stage 3, 'sync' invoke
add_aio_request() in the sheepdog driver and add_aio_request *succeed* with aio
put on the inflight_aio_head list, *not* on the failed_aio_head list. So in the
reconnect_to_sdog(), we have no way to resend the targeted aio and 'sync' wait
for ever.
Thanks
Yuan
- [Qemu-devel] [PATCH v2 1/9] ignore SIGPIPE in qemu-img and qemu-io, (continued)
- [Qemu-devel] [PATCH v2 1/9] ignore SIGPIPE in qemu-img and qemu-io, MORITA Kazutaka, 2013/07/24
- [Qemu-devel] [PATCH v2 3/9] sheepdog: check return values of qemu_co_recv/send correctly, MORITA Kazutaka, 2013/07/24
- [Qemu-devel] [PATCH v2 7/9] sheepdog: try to reconnect to sheepdog after network error, MORITA Kazutaka, 2013/07/24
- [Qemu-devel] [PATCH v2 5/9] sheepdog: reload inode outside of resend_aioreq, MORITA Kazutaka, 2013/07/24
- [Qemu-devel] [PATCH v2 6/9] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers, MORITA Kazutaka, 2013/07/24
- [Qemu-devel] [PATCH v2 8/9] sheepdog: make add_aio_request and send_aioreq void functions, MORITA Kazutaka, 2013/07/24
- [Qemu-devel] [PATCH v2 9/9] sheepdog: cancel aio requests if possible, MORITA Kazutaka, 2013/07/24
- Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure, Liu Yuan, 2013/07/24
- Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure, MORITA Kazutaka, 2013/07/24
- Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure, Liu Yuan, 2013/07/24
- Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure,
Liu Yuan <=
- Re: [Qemu-devel] [PATCH v2 0/9] sheepdog: reconnect server after connection failure, Liu Yuan, 2013/07/25
- [Qemu-devel] [PATCH 1/2] sheepdog: correct signedness of comparison, Liu Yuan, 2013/07/25
- [Qemu-devel] [PATCH 2/2] sheepdog: put aio request into failed list when failing to send request, Liu Yuan, 2013/07/25
- Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure, MORITA Kazutaka, 2013/07/25
- Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure, Liu Yuan, 2013/07/25