qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: recent flakiness (intermittent hangs) of migration-test


From: Philippe Mathieu-Daudé
Subject: Re: recent flakiness (intermittent hangs) of migration-test
Date: Mon, 2 Nov 2020 14:55:04 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1

On 10/30/20 2:53 PM, Peter Xu wrote:
> On Fri, Oct 30, 2020 at 11:48:28AM +0000, Peter Maydell wrote:
>>> Peter, is it possible that you enable QTEST_LOG=1 in your future 
>>> migration-test
>>> testcase and try to capture the stderr?  With the help of commit a47295014d
>>> ("migration-test: Only hide error if !QTEST_LOG", 2020-10-26), the test 
>>> should
>>> be able to dump quite some helpful information to further identify the 
>>> issue.
>>
>> Here's the result of running just the migration test with
>> QTEST_LOG=1:
>> https://people.linaro.org/~peter.maydell/migration.log
>> It's 300MB because when the test hangs one of the processes
>> is apparently in a polling state and continues to send status
>> queries.
>>
>> My impression is that the test is OK on an unloaded machine but
>> more likely to fail if the box is doing other things at the
>> same time. Alternatively it might be a 'parallel make check' bug.
> 
> Thanks for collecting that, Peter.
> 
> I'm copy-pasting the important information out here (with some moves and
> indents to make things even clearer):
> 
> ...
> {"execute": "migrate-recover", "arguments": {"uri": 
> "unix:/tmp/migration-test-nGzu4q/migsocket-recover"}, "id": "recover-cmd"}
> {"timestamp": {"seconds": 1604056292, "microseconds": 177955}, "event": 
> "MIGRATION", "data": {"status": "setup"}}
> {"return": {}, "id": "recover-cmd"}
> {"execute": "query-migrate"}
> ...
> {"execute": "migrate", "arguments": {"resume": true, "uri": 
> "unix:/tmp/migration-test-nGzu4q/migsocket-recover"}}
> qemu-system-x86_64: ram_save_queue_pages no previous block
> qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> {"return": {}}
> {"execute": "migrate-set-parameters", "arguments": {"max-postcopy-bandwidth": 
> 0}}
> ...
> 
> The problem is probably an misuse on last_rb on destination node.  When 
> looking
> at it, I also found a race.  So I guess I should fix both...
> 
> Peter, would it be easy to try apply the two patches I attached to see whether
> the test hang would be resolved?  Dave, feel free to give early comments too 
> on
> the two fixes before I post them on the list.

Per this comment:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg756235.html
You could add:
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]