[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] I have two issues about it to discuss with you RE: ping RE:
From: |
wangjie (P) |
Subject: |
[Qemu-devel] I have two issues about it to discuss with you RE: ping RE: question: I found a qemu crash about migration |
Date: |
Fri, 29 Sep 2017 03:20:53 +0000 |
Excuse me, I have two issues to discuss with you :>
//Question 1:
Can we fix it by the demo patch as attachment? (the patch is base on
qemu-2.8.1, just a demo for discuss)
Because I think we should not set INACTIVE label for drive-mirror
BlockDriverState, mirror IO will not affect the memory migration. So we can
filter the drive-mirror bs in bdrv_inactivate_all, and I tested, my demo patch
can fix it.
//Question 2:
I used git-bisect and found the qemu-2.6.0 is OK, because I cann't trigger the
bug on qemu-2.6.0 even though I tried many times. But I can trigger the bug
and reproduce it very easy since qemu-2.7.0-rc0. so I git-biset it go on,
and found the bug begin to occur since the patchs committed as following:
commit cbd614870fce00f46088be7054a7bf5eadcc77ac
Merge: 500acc9 0bc7a6f
Author: Peter Maydell <address@hidden>
Date: Thu Jun 2 13:42:52 2016 +0100
Merge remote-tracking branch 'remotes/famz/tags/pull-docker-20160601' into
staging
.....But I reviewed the patchs about the
commit(cbd614870fce00f46088be7054a7bf5eadcc77ac) above, I didn't find something
contact to this bug, did I neglect some details?
-----Original Message-----
From: Dr. David Alan Gilbert [mailto:address@hidden
Sent: Friday, September 29, 2017 1:01 AM
To: wangjie (P) <address@hidden>
Cc: address@hidden; address@hidden; address@hidden; address@hidden;
address@hidden; fuweiwei (C) <address@hidden>; address@hidden; address@hidden
address@hidden <address@hidden>; address@hidden
Subject: Re: ping RE: question: I found a qemu crash about migration
* wangjie (P) (address@hidden) wrote:
> Ping?
>
> From: wangjie (P)
> Sent: Tuesday, September 26, 2017 9:10 PM
> To: address@hidden; address@hidden; address@hidden
> Cc: wangjie (P) <address@hidden>; fuweiwei (C)
> <address@hidden>; address@hidden; address@hidden;
> address@hidden; address@hidden; Wubin (H) <address@hidden>
> Subject: question: I found a qemu crash about migration
>
> Hi,
>
> When I use qemuMigrationRun to migrate both memory and storage with
> some IO press in VM, and configured iothreads. We triggered a error
> reports: (I use the current qemu master branch) " bdrv_co_do_pwritev:
> Assertion `!(bs->open_flags & 0x0800)' failed",
>
> I reviewed the code, and gdb the coredump file, I think one case can
> trigger the error reports
>
> Case:
>
> Migration_thread()
> Migration_completion() ----------> last iteration of memory migration
> Vm_stop_force_state()--------------> Stop the VM, and call
> bdrv_drain_all, but I gdb the core file, and found the cnt of dirty bitmap of
> driver-mirror is not 0, and in_flight mirror IO is 16,
> Bdrv_inactivate_all()----------------> inactivate images
> and set the INACTIVE label.
> -> bdrv_co_do_pwritev()-------------->then the mirror IO handled
> after will trigger the Assertion `!(bs->open_flags & 0x0800)' and qemu
> crashed
>
>
>
>
> As we can see from above, Migration_completion call
> Bdrv_inactivate_all to inactivate images, but the mirror_run is not
> done (still has dirty clusters), the mirror_run IO issued later will
> triggered error reports: " bdrv_co_do_pwritev: Assertion
> `!(bs->open_flags & 0x0800)' failed",
>
> It seems that memory migration and storage mirror is done independently and
> the sequence of the two progresses are quite random.
>
> How can I solve this problem, should we not set INACTIVE label for
> drive-mirror BlockDriverState?
Hi,
This is a 'fun' bug; I had a good chat to kwolf about it earlier.
A proper fix really needs to be done together with libvirt so that we can
sequence:
a) The stopping of the CPU on the source
b) The termination of the mirroring block job
c) The inactivation of the block devices on the source
(bdrv_inactivate_all)
d) The activation of the block devices on the destination
(bdrv_invalidate_cache_all)
e) The start of the CPU on the destination
It looks like you're hitting a race between b/c; we've had races between c/d
in the past and moved the bdrv_inactivate_all.
During the discussion we ended up with two proposed solutions; both of them
require one extra command and one extra migration capability.
The block way
-------------
1) Add a new migration capability pause-at-complete
2) Add a new migration state almost-complete
3) After saving devices, if pause-at-complete is set,
transition to almost-complete
4) Add a new command (migration-continue) that
causes the migration to inactivate the devices (c)
and send the final EOF to the destination.
You set pause-at-complete, wait until migrate hits almost-complete; cleanup the
mirror job, and then do migration-continue. When it completes do 'cont' on the
destination.
The migration way
-----------------
1) Stop doing (d) when the destination is started with -S
since it happens anyway when 'cont' is issued
2) Add a new migration capability ext-manage-storage
3) When 'ext-manage-storage' is set, we don't bother doing (c)
4) Add a new command 'block-inactivate' on the source
You set ext-manage-storage, do the migrate and when it's finished clean up the
block job, block-inactivate on the source, and then cont on the destination.
My worry about the 'block way' is that the point at which we do the pause seems
pretty interesting; it probably is best done after the final device save but
before the inactivate, but could be done before it. But it probably becomes
API and something might become dependent on where we did it.
I think Kevin's worry about the 'migration way' is that it's a bit of a
block-specific fudge; which is probably right.
I've not really thought what happens when you have a mix of shared and
non-shared storage.
Could we do any hack that isn't libvirt-visible for existing versions?
I guess maybe hack drive-mirror so it interlocks with the migration code
somehow to hold off on that inactivate?
This code is visible probalby from 2.9.ish with the new locking code; but
really that b/c race has been there for ever - there's maybe always the chance
that the last few blocks of mirroring might have happened too late ?
Thoughts?
What are the libvirt view on the preferred solution.
Dave
> Qemu Crash bt:
> (gdb) bt
> #0 0x00007f6b6e2a71d7 in raise () from /usr/lib64/libc.so.6
> #1 0x00007f6b6e2a88c8 in abort () from /usr/lib64/libc.so.6
> #2 0x00007f6b6e2a0146 in __assert_fail_base () from
> /usr/lib64/libc.so.6
> #3 0x00007f6b6e2a01f2 in __assert_fail () from /usr/lib64/libc.so.6
> #4 0x00000000007b9211 in bdrv_co_pwritev (child=<optimized out>,
> address@hidden, address@hidden,
> address@hidden, flags=0) at block/io.c:1536
> #5 0x00000000007a6f02 in blk_co_pwritev (blk=0x2f92750, offset=7034896384,
> bytes=65536, qiov=0x7f69cc09b068,
> flags=<optimized out>) at block/block_backend.c:851
> #6 0x00000000007a6fc1 in blk_aio_write_entry (opaque=0x301dad0) at
> block/block_backend.c:1043
> #7 0x0000000000835e2a in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at util/coroutine_ucontext.c:79
> #8 0x00007f6b6e2b8cf0 in ?? () from /usr/lib64/libc.so.6
> #9 0x00007f6a1bcfc780 in ?? ()
> #10 0x0000000000000000 in ?? ()
>
> And I see the mirror_run is not done, gdb info as following:
> [cid:image001.png@01D3386F.DBC9FF10]
>
>
> Src VM qemu log:
>
> [cid:image002.png@01D3386F.DBC9FF10]
>
>
>
>
>
>
>
>
>
>
>
>
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
demo.patch
Description: demo.patch
- [Qemu-devel] I have two issues about it to discuss with you RE: ping RE: question: I found a qemu crash about migration,
wangjie (P) <=