qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_f


From: Leonardo Bras Soares Passos
Subject: Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
Date: Tue, 23 Nov 2021 01:46:44 -0300

Hello Daniel,

On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
[...]
> > @@ -561,12 +577,15 @@ static ssize_t 
> > qio_channel_socket_writev_flags(QIOChannel *ioc,
> >   retry:
> >      ret = sendmsg(sioc->fd, &msg, flags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            return QIO_CHANNEL_ERR_NOBUFS;
>
> Why does ENOBUFS need handling separately instead of letting
> the error_setg_errno below handle it ?
>
> The caller immediately invokes error_setg_errno() again,
> just with different error message.
>
> No code in this series ever looks at QIO_CHANNEL_ERR_NOBUFS
> either, so we don't even need that special error return code
> added AFAICT ?
>

The idea was to add a custom message for ENOBUFS return when sending
with MSG_ZEROCOPY.
I mean, having this message is important for the user to understand
why the migration is failing, but it would
not make any sense to have this message while a non-zerocopy sendmsg()
returns with ENOBUFS.

ENOBUFS : The output queue for a network interface was full.  This
generally indicates that the interface has stopped sending, but may be
caused by transient congestion.

As an alternative, I could add this message inside the switch, inside
an if (flags & MSG_ZEROCOPY) on qio_channel_socket_writev_flags()
instead of in it's caller.
But for me it looks bloated, I mean, dealing with an error for
ZEROCOPY only in the general function.

OTOH, if you think that it's a better idea to deal with every error in
qio_channel_socket_writev_flags() instead of in the caller, I will
change it for v6. Please let me know.

> >          }
> > +
> >          error_setg_errno(errp, errno,
> >                           "Unable to write to socket");
> >          return -1;
> > @@ -670,6 +689,127 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >  }
> >  #endif /* WIN32 */
> >
> > +
> > +#ifdef CONFIG_LINUX
> > +
> > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> > +                                   Error **errp)
>
> There's only one caller and it always passes zerocopy=true,
> so this parmeter looks pointless.

I did that for possible reuse of this function in the future:
- As of today, this is certainly compiled out, but if at some point
someone wants to use poll for something other
than the reading of an zerocopy errqueue, it could be reused.

But sure, if that's not desirable, I can remove the parameter (and the
if clause for !zerocopy).

>
> > +{
> > +    struct pollfd pfd;
> > +    int ret;
> > +
> > +    pfd.fd = sioc->fd;
> > +    pfd.events = 0;
> > +
> > + retry:
> > +    ret = poll(&pfd, 1, -1);
> > +    if (ret < 0) {
> > +        switch (errno) {
> > +        case EAGAIN:
> > +        case EINTR:
> > +            goto retry;
> > +        default:
> > +            error_setg_errno(errp, errno,
> > +                             "Poll error");
> > +            return ret;
>
>        return -1;
>
> > +        }
> > +    }
> > +
> > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> > +        return -1;
> > +    }
> > +
> > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > +        error_setg(errp, "Poll error: Errors present in errqueue");
> > +        return -1;
> > +    }
>
> > +
> > +    return ret;
>
>   return 0;

In the idea of future reuse I spoke above, returning zero here would
make this function always look like the poll timed out. Some future
users may want to repeat the waiting if poll() timed out, or if
(return > 0) stop polling.
(That was an earlier approach of this series)

>
> > +}
> > +
> > +static ssize_t qio_channel_socket_writev_zerocopy(QIOChannel *ioc,
> > +                                                  const struct iovec *iov,
> > +                                                  size_t niov,
> > +                                                  Error **errp)
> > +{
> > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > +    ssize_t ret;
> > +
> > +    ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0,
> > +                                          MSG_ZEROCOPY, errp);
> > +    if (ret == QIO_CHANNEL_ERR_NOBUFS) {
> > +        error_setg_errno(errp, errno,
> > +                         "Process can't lock enough memory for using 
> > MSG_ZEROCOPY");
>
> This should not be touching errno - the method should be setting the
> errp directly, not leaving it to the caller.
>

Discussed above.
If you think that's a better approach I can change for v6.


> > +        return -1;
> > +    }
> > +
> > +    sioc->zerocopy_queued++;
>
>  if (ret > 0)
>     sio->zerocopy_queued++
>

Nice catch!

> since the kernel doesn't increase the counter if the data sent
> was zero length. A caller shouldn't be passing us a zero length
> iov data element, but it is wise to be cautious

Seems ok to me.

>
> > +    return ret;
> > +}
> > +
> > +static int qio_channel_socket_flush_zerocopy(QIOChannel *ioc,
> > +                                             Error **errp)
> > +{
> > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > +    struct msghdr msg = {};
> > +    struct sock_extended_err *serr;
> > +    struct cmsghdr *cm;
> > +    char control[CMSG_SPACE(sizeof(*serr))];
> > +    int ret;
>
> Add
>
>    int rv = 0;
>
> > +
> > +    msg.msg_control = control;
> > +    msg.msg_controllen = sizeof(control);
> > +    memset(control, 0, sizeof(control));
> > +
> > +    while (sioc->zerocopy_sent < sioc->zerocopy_queued) {
> > +        ret = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> > +        if (ret < 0) {
> > +            switch (errno) {
> > +            case EAGAIN:
> > +                /* Nothing on errqueue, wait until something is available 
> > */
> > +                ret = qio_channel_socket_poll(sioc, true, errp);
> > +                if (ret < 0) {
> > +                    return -1;
> > +                }
> > +                continue;
> > +            case EINTR:
> > +                continue;
> > +            default:
> > +                error_setg_errno(errp, errno,
> > +                                 "Unable to read errqueue");
> > +                return -1;
> > +            }
> > +        }
> > +
> > +        cm = CMSG_FIRSTHDR(&msg);
> > +        if (cm->cmsg_level != SOL_IP &&
> > +            cm->cmsg_type != IP_RECVERR) {
> > +            error_setg_errno(errp, EPROTOTYPE,
> > +                             "Wrong cmsg in errqueue");
> > +            return -1;
> > +        }
> > +
> > +        serr = (void *) CMSG_DATA(cm);
> > +        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
> > +            error_setg_errno(errp, serr->ee_errno,
> > +                             "Error on socket");
> > +            return -1;
> > +        }
> > +        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
> > +            error_setg_errno(errp, serr->ee_origin,
> > +                             "Error not from zerocopy");
> > +            return -1;
> > +        }
> > +
> > +        /* No errors, count successfully finished sendmsg()*/
> > +        sioc->zerocopy_sent += serr->ee_data - serr->ee_info + 1;
>
> Here add
>
>
>      if (ee_code ==  SO_EE_CODE_ZEROCOPY_COPIED)
>         rv = 1;
>
> > +    }
> > +    return 0;
>
> return rv;
>
> > +}
>
>

I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
to tell whenever zerocopy fell back to copying for some reason, but I
don't see how this can be helpful here.

Other than that I would do rv++ instead of rv=1 here, if I want to
keep track of how many buffers were sent with zerocopy and how many
ended up being copied.

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Thanks for this feedback Daniel,

Best regards,
Leo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]