|
From: | Laurent Vivier |
Subject: | Re: netdev-socket test hang (s390 host, mips64el guest, backtrace) |
Date: | Mon, 17 Apr 2023 15:02:40 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 |
On 4/17/23 12:16, Alex Bennée wrote:
Laurent Vivier <lvivier@redhat.com> writes:Hi Peter, On 4/13/23 14:05, Peter Maydell wrote:On Thu, 13 Apr 2023 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:I just found a hung netdev-socket test on our s390 CI runner. Looks like a deadlock, no processes using CPU. Here's the backtrace; looks like both QEMU processes are sat idle but the test process is sat waiting forever for something in test_stream_inet_reconnect(). Any ideas?May well not be related, but I think there's a race condition in this test's inet_get_free_port() code. The code tries to find a free port number by creating a socket, looking at what port it is bound to, and then closing the socket. If there are several copies of this test running at once (as is plausible in a 'make -j8' setup), then you can get an interleaving: test 1 test 2 find a port number close the socket find a port number (get the same number as test 1) close the socket use port number for test use port number for test (fail because of test 1)I don't see an easy way to avoid to race, but perhaps we can change the test to use unix socket rather than inet one? In this case we can use an unique name.We could use a lock file that would stop the test clashing with itself (although another process on the machine could still race for the socket). The unix socket would be easier but wouldn't we loose test coverage or do we not care about the exact details for this test?
According to the backtrace, the problem happens with the reconnect test because we launch twice the server side to test the client reconnection. I think this test can be moved to an unix socket without any issue. For the other inet tests, there can be a race but the window is much more shorter, and we want to test inet, not unix. Even with a lock file, the port can be taken by another process.
Thanks, Laurent
[Prev in Thread] | Current Thread | [Next in Thread] |