[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 05/10] slirp: switch to GPollFD
From: |
Laszlo Ersek |
Subject: |
Re: [Qemu-devel] [PATCH v3 05/10] slirp: switch to GPollFD |
Date: |
Wed, 06 Feb 2013 01:59:25 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130108 Thunderbird/10.0.12 |
comments in-line
On 02/04/13 13:12, Stefan Hajnoczi wrote:
> Slirp uses rfds/wfds/xfds more extensively than other QEMU components.
>
> The rarely-used out-of-band TCP data feature is used. That means we
> need the full table of select(2) to g_poll(3) events:
>
> rfds -> G_IO_IN | G_IO_HUP | G_IO_ERR
> wfds -> G_IO_OUT | G_IO_ERR
> xfds -> G_IO_PRI
>
> I came up with this table by looking at Linux fs/select.c which maps
> select(2) to poll(2) internally.
>
> Another detail to watch out for are the global variables that reference
> rfds/wfds/xfds during slirp_select_poll(). sofcantrcvmore() and
> sofcantsendmore() use these globals to clear fd_set bits. When
> sofcantrcvmore() is called, the wfds bit is cleared so that the write
> handler will no longer be run for this iteration of the event loop.
>
> This actually seems buggy to me since TCP connections can be half-closed
> and we'd still want to handle data in half-duplex fashion. I think the
> real intention is to avoid running the read/write handler when the
> socket has been fully closed. This is indicated with the SS_NOFDREF
> state bit so we now check for it before invoking the TCP write handler.
I agree with how you interpret SS_NOFDREF.
> Note that UDP/ICMP code paths don't care because they are
> connectionless.
I guess they rather don't care because they are unreliable -- it doesn't
matter if we lose an outgoing packet somewhere in the tubes or right at
our own full socket buffer, so we don't care about write readiness. (But
I may be misunderstanding it all.)
> if (so->so_state & SS_NOFDREF || so->s == -1) {
> continue;
> }
> @@ -475,15 +490,15 @@ void slirp_select_poll(fd_set *readfds, fd_set
> *writefds, fd_set *xfds,
> /*
> * Check for URG data
> * This will soread as well, so no need to
> - * test for readfds below if this succeeds
> + * test for G_IO_IN below if this succeeds
> */
> - if (FD_ISSET(so->s, xfds)) {
> + if (revents & G_IO_PRI) {
> sorecvoob(so);
> }
According to SUSv3+, a bit/fd set in xfds means "socket level error
pending, or socket-specific exceptional condition". In other words,
G_IO_ERR|G_IO_PRI. Therefore this change looks a bit suspicious, because
a pending socket-level error would trigger this branch before the patch,
but trickle down after the patch.
To test this, I've written the attached small test program. It hangs on
RHEL-6.3 (2.6.32-279.19.1.el6.x86_64), which means that
- RHEL-6.3 select() is not SUSv3/SUSv4-conformant, and
- the mapping you mention in the commit message, and implement here, is
correct in practice.
(As a sanity check for the test program, if you pass in the fdset as
"writefds", the pending error is signalled & fetched OK.)
> /*
> * Check sockets for reading
> */
> - else if (FD_ISSET(so->s, readfds)) {
> + else if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
> /*
> * Check for incoming connections
> */
> @@ -502,7 +517,8 @@ void slirp_select_poll(fd_set *readfds, fd_set *writefds,
> fd_set *xfds,
> /*
> * Check sockets for writing
> */
> - if (FD_ISSET(so->s, writefds)) {
> + if (!(so->so_state & SS_NOFDREF) &&
> + (revents & (G_IO_OUT | G_IO_ERR))) {
> /*
> * Check for non-blocking, still-connecting sockets
> */
Looking at nothing but the loop, SS_NOFDREF appears impossible here;
however soread() --> sofcantrcvmore() can set it after we pass the
initial check.
Ugh. This is where sofcantrcvmore() / sofcantsendmore() show their split
personalities.
The second part of each seems OK. They both try to set the
SS_FCANTxxxxMORE bit corresponding to their names (meaning, that channel
is shut down). If they find that the *other* direction has been already
shut down, they set SS_NOFDREF instead. Fine.
The shutdown() direction is also correct in each, SHUT_RD==0 in
sofcantrcvmore(), SHUT_WR==1 in the other. Fine again.
What is baffling is why the other direction's fd_set(s) is (are) pruned!
git-blame doesn't help, it dates back to slirp's initial commit in qemu.
I'm intrigued.
Anyway, if we'd like to remain bug-compatible, then
if (!(so->so_state & (SS_NOFDREF | SS_FCANTRCVMORE)) &&
(revents & (G_IO_OUT | G_IO_ERR))) {
would be necessary here. This drags the bug into sunlight (why shouldn't
we try to send if we read EOF?), but I believe this is what covers both
of the "so_state" branches in sofcantrcvmore().
... Except, of course, this would permanently kill our ability to send
back on the half-closed socket. sofcantrcvmore() clears the write-set
bit only for the current iteration; the next iteration will set it up
again. Whereas SS_FCANTRCVMORE is permanent, once set.
Seems like we can't copy the original behavior here and have no choice
but fixing this slirp bug. Shudder!
> @@ -588,9 +604,18 @@ void slirp_select_poll(fd_set *readfds, fd_set
> *writefds, fd_set *xfds,
> */
> for (so = slirp->udb.so_next; so != &slirp->udb;
> so = so_next) {
> + int revents;
> +
> so_next = so->so_next;
>
> - if (so->s != -1 && FD_ISSET(so->s, readfds)) {
> + revents = 0;
> + if (so->pollfds_idx != -1) {
> + revents = g_array_index(pollfds, GPollFD,
> + so->pollfds_idx).revents;
> + }
> +
> + if (so->s != -1 &&
> + (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR))) {
> sorecvfrom(so);
> }
> }
I think (so->s != -1) is constant true here *if* the "revents"
expression evals to non-zero.
- We know that "revents" has at least one interesting bit set,
- that means (pollfds_idx != -1)
- that means -- see slirp_select_fill() conversion -- that
(pfd.fd == so->s && so->s != -1)
But it doesn't hurt.
> @@ -600,9 +625,18 @@ void slirp_select_poll(fd_set *readfds, fd_set
> *writefds, fd_set *xfds,
> */
> for (so = slirp->icmp.so_next; so != &slirp->icmp;
> so = so_next) {
> - so_next = so->so_next;
> + int revents;
> +
> + so_next = so->so_next;
> +
> + revents = 0;
> + if (so->pollfds_idx != -1) {
> + revents = g_array_index(pollfds, GPollFD,
> + so->pollfds_idx).revents;
> + }
>
> - if (so->s != -1 && FD_ISSET(so->s, readfds)) {
> + if (so->s != -1 &&
> + (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR))) {
> icmp_receive(so);
> }
> }
Same here.
> @@ -610,15 +644,6 @@ void slirp_select_poll(fd_set *readfds, fd_set
> *writefds, fd_set *xfds,
>
> if_start(slirp);
> }
> -
> - /* clear global file descriptor sets.
> - * these reside on the stack in vl.c
> - * so they're unusable if we're not in
> - * slirp_select_fill or slirp_select_poll.
> - */
> - global_readfds = NULL;
> - global_writefds = NULL;
> - global_xfds = NULL;
> }
>
> static void arp_input(Slirp *slirp, const uint8_t *pkt, int pkt_len)
> diff --git a/slirp/socket.c b/slirp/socket.c
> index 77b0c98..a7ab933 100644
> --- a/slirp/socket.c
> +++ b/slirp/socket.c
> @@ -680,9 +680,6 @@ sofcantrcvmore(struct socket *so)
> {
> if ((so->so_state & SS_NOFDREF) == 0) {
> shutdown(so->s,0);
> - if(global_writefds) {
> - FD_CLR(so->s,global_writefds);
> - }
Yep.
> }
> so->so_state &= ~(SS_ISFCONNECTING);
> if (so->so_state & SS_FCANTSENDMORE) {
> @@ -698,12 +695,6 @@ sofcantsendmore(struct socket *so)
> {
> if ((so->so_state & SS_NOFDREF) == 0) {
> shutdown(so->s,1); /* send FIN to fhost */
> - if (global_readfds) {
> - FD_CLR(so->s,global_readfds);
> - }
> - if (global_xfds) {
> - FD_CLR(so->s,global_xfds);
> - }
Right.
> }
> so->so_state &= ~(SS_ISFCONNECTING);
> if (so->so_state & SS_FCANTRCVMORE) {
> diff --git a/slirp/socket.h b/slirp/socket.h
> index 857b0da..57e0407 100644
> --- a/slirp/socket.h
> +++ b/slirp/socket.h
> @@ -20,6 +20,8 @@ struct socket {
>
> int s; /* The actual socket */
>
> + int pollfds_idx; /* GPollFD GArray index */
> +
> Slirp *slirp; /* managing slirp instance */
>
> /* XXX union these with not-yet-used sbuf params */
(I generate hunks for header files before those for C files in my
patches, see the "-O<orderfile>" option of git-format-patch.)
> diff --git a/stubs/slirp.c b/stubs/slirp.c
> index 9a3309a..f1fc833 100644
> --- a/stubs/slirp.c
> +++ b/stubs/slirp.c
> @@ -5,13 +5,11 @@ void slirp_update_timeout(uint32_t *timeout)
> {
> }
>
> -void slirp_select_fill(int *pnfds, fd_set *readfds,
> - fd_set *writefds, fd_set *xfds)
> +void slirp_pollfds_fill(GArray *pollfds)
> {
> }
>
> -void slirp_select_poll(fd_set *readfds, fd_set *writefds,
> - fd_set *xfds, int select_error)
> +void slirp_pollfds_poll(GArray *pollfds, int select_error)
> {
> }
>
Reviewed-by: Laszlo Ersek <address@hidden>
xfds_test.c
Description: Text document
- [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, (continued)
- [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, Stefan Hajnoczi, 2013/02/04
- Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, Laszlo Ersek, 2013/02/04
- Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, Stefan Hajnoczi, 2013/02/04
- Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, Laszlo Ersek, 2013/02/04
- Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, Paolo Bonzini, 2013/02/04
- Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, Fabien Chouteau, 2013/02/04
- Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts, Paolo Bonzini, 2013/02/04
[Qemu-devel] [PATCH v3 06/10] iohandler: switch to GPollFD, Stefan Hajnoczi, 2013/02/04
[Qemu-devel] [PATCH v3 05/10] slirp: switch to GPollFD, Stefan Hajnoczi, 2013/02/04
- Re: [Qemu-devel] [PATCH v3 05/10] slirp: switch to GPollFD,
Laszlo Ersek <=
[Qemu-devel] [PATCH v3 04/10] slirp: slirp/slirp.c coding style cleanup, Stefan Hajnoczi, 2013/02/04
[Qemu-devel] [PATCH v3 07/10] main-loop: drop rfds/wfds/xfds for good, Stefan Hajnoczi, 2013/02/04
[Qemu-devel] [PATCH v3 08/10] aio: extract aio_dispatch() from aio_poll(), Stefan Hajnoczi, 2013/02/04
[Qemu-devel] [PATCH v3 09/10] aio: convert aio_poll() to g_poll(3), Stefan Hajnoczi, 2013/02/04