qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netdev-socket test hang (s390 host, mips64el guest, backtrace)


From: Laurent Vivier
Subject: Re: netdev-socket test hang (s390 host, mips64el guest, backtrace)
Date: Mon, 17 Apr 2023 15:02:40 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 4/17/23 12:16, Alex Bennée wrote:

Laurent Vivier <lvivier@redhat.com> writes:

Hi Peter,

On 4/13/23 14:05, Peter Maydell wrote:
On Thu, 13 Apr 2023 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:

I just found a hung netdev-socket test on our s390 CI runner.
Looks like a deadlock, no processes using CPU.
Here's the backtrace; looks like both QEMU processes are sat
idle but the test process is sat waiting forever for something
in test_stream_inet_reconnect(). Any ideas?
May well not be related, but I think there's a race condition
in this test's inet_get_free_port() code. The code tries
to find a free port number by creating a socket, looking
at what port it is bound to, and then closing the socket.
If there are several copies of this test running at once
(as is plausible in a 'make -j8' setup), then you can
get an interleaving:
   test 1                       test 2
     find a port number
     close the socket
                                find a port number
                                (get the same number as test 1)
                                close the socket
     use port number for test
                                use port number for test
                                (fail because of test 1)


I don't see an easy way to avoid to race, but perhaps we can change
the test to use unix socket rather than inet one? In this case we can
use an unique name.

We could use a lock file that would stop the test clashing with itself
(although another process on the machine could still race for the
socket). The unix socket would be easier but wouldn't we loose test
coverage or do we not care about the exact details for this test?

According to the backtrace, the problem happens with the reconnect test because we launch twice the server side to test the client reconnection. I think this test can be moved to an unix socket without any issue. For the other inet tests, there can be a race but the window is much more shorter, and we want to test inet, not unix. Even with a lock file, the port can be taken by another process.

Thanks,
Laurent





reply via email to

[Prev in Thread] Current Thread [Next in Thread]