qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5] net: L2TPv3 transport


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v5] net: L2TPv3 transport
Date: Wed, 26 Mar 2014 09:24:53 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Mar 25, 2014 at 10:35:28AM +0000, Anton Ivanov wrote:
> On 25/03/14 10:17, Stefan Hajnoczi wrote:
> > On Mon, Mar 24, 2014 at 11:56:16AM +0000, address@hidden wrote:
> >> 1. Correct buffering and corect poll FSM
> >>
> >> Current qemu queue logic assumes single packet inputs, not multiple packet
> >> RX. If the receiver cannot take more packets it will enqueue the new
> >> arrivals and incur a malloc, memcpy and free to do that.
> >>
> >> This is unnecessary if you use recvmmsg and treat the receive message 
> >> array as
> >> a ring buffer. If the peer returns a 0 you leave the message(s) in the 
> >> receive 
> >> array and use the remainder of the already message array to continue
> >> reading. It is a natural queue, no need to double-buffer and memcpy on 
> >> top of that.
> >>
> >> The poll logic and the send logic has been updated to account for that.
> >>
> >> End result - 10%+ performance compared to using qemu_send_packet_async
> > Does this mean you regularly see qemu_send_packet_async() return 0
> > because l2tp is reading more packets using recvmmsg() than the NIC can
> > fit into its rx ring?  I'd like to understand this case better.
> 
> qemu_send_packet_async() will return non-zero if its underlying queue 
> enqueues itself which is a memcpy and double-buffer. It will return zero only 
> if l2tpv3 fills it so fast that it fills the queue in queue.c

Not sure if we're talking about the same thing here but I think I
disagree.  qemu_net_queue_send() does the following:
if (queue->delivering || !qemu_can_send_packet(sender)) {
    qemu_net_queue_append(queue, sender, flags, data, size, sent_cb);
    return 0;
}

If receive is disabled on the peer, then qemu_can_send_packet(sender)
returns false.  The packet is enqueued in the peer's incoming queue and
qemu_send_packet_async() returns 0.

How does receive get disabled on the peer?  If the peer's ->receive()
function returned 0 then receive gets disabled until
qemu_flush_queued_packets() is called to signal that it's okay to resume
transmission.

> So the fact that qemu_send_packet_async() has returned a non-zero does not 
> mean that we have not paid the price for it :) 

A non-zero return is simply an error code from the ->receive() function.
In this case the packet is dropped but queuing is unaffected.

> The relevant code is in qemu_net_queue_append which invoked by
> qemu_net_queue_send which is what qemu_send_packet_async calls.

Are we interpreting the code differently?

> >
> >> +static void l2tpv3_writable(void *opaque)
> >> +{
> >> +    NetL2TPV3State *s = opaque;
> >> +
> >> +    l2tpv3_write_poll(s, false);
> >> +
> >> +    net_l2tpv3_send(s);
> > This is the wrong function because net-l2tpv3_send() *reads* new packets
> > from the socket.  We must tell the peer to resume transmitting packets
> > so we can *write* them to the socket.
> >
> > This line should be:
> > qemu_flush_queued_packets(&s->nc);
> 
> No.
> 
> That function doubles for local buffer using the recvmmsg vector, there
> is never any packets in the queue from queue.c to flush. The exact idea
> here is to eliminate that queue as it is surplus to requirements if you
> are doing multirx. If you are doing multirx (or packet mmap) your
> packets are in a multi-packet buffer already. It is a perfect queue,
> there is no need for another :)
> 
> The send function is written so it picks up where it has left off last
> time so any packets left over in the recvmmsg vector are flushed first
> before any new reads.

You're thinking about the wrong data transfer direction.  There are two
queues:
1. recvmmsg() -> NIC incoming queue - you are trying to bypass this.
2. sendmsg() -> l2tpv3's incoming queue - if sendmsg() fails with EAGAIN.

l2tpv3_writable() is invoked when the socket becomes writable (case #2)
and we can resume sendmsg().  This has nothing to do with recvmmsg()
(case #1).

In other words, the host kernel stopped us from transmitting packets
because it ran out of buffer space.  So the NIC placed packets into
l2tpv3's incoming queue and now we need to flush them, as well as
re-enabling transmission.

> >> +    }
> >> +    if (s->queue_depth > 0 && (!s->delivering)) {
> >> +        do {
> >> +            msgvec = s->msgvec + s->queue_tail;
> >> +            if (msgvec->msg_len > 0) {
> >> +                data_size = msgvec->msg_len - s->header_size;
> >> +                vec = msgvec->msg_hdr.msg_iov;
> >> +                if ((data_size > 0) &&
> >> +                    (l2tpv3_verify_header(s, vec->iov_base) == 0)) {
> >> +                    vec++;
> >> +                    /* we do not use the "normal" send and send async
> >> +                     * functions here because we have our own buffer -
> >> +                     * the message vector. If we use the "usual" ones
> >> +                     * we will end up double-buffering.
> >> +                     */
> >> +                    s->delivering = true;
> >> +                    /* deliver directly to peer bypassing queueing and
> >> +                     * buffering
> >> +                     */
> >> +                    size = qemu_deliver_packet(
> >> +                            &s->nc,
> >> +                            QEMU_NET_PACKET_FLAG_NONE,
> >> +                            vec->iov_base,
> >> +                            data_size,
> >> +                            s->nc.peer
> >> +                        );
> >> +                    s->delivering = false;
> >> +                    bad_read = false;
> >> +                } else {
> >> +                    bad_read = true;
> >> +                    if (!s->header_mismatch) {
> >> +                        /* report error only once */
> >> +                        error_report("l2tpv3 header verification failed");
> >> +                        s->header_mismatch = true;
> >> +                    }
> >> +                }
> >> +            } else {
> >> +                bad_read = true;
> >> +            }
> >> +            if ((bad_read) || (size > 0)) {
> >> +                s->queue_tail = (s->queue_tail + 1) % MAX_L2TPV3_MSGCNT;
> >> +                s->queue_depth--;
> >> +            }
> >> +        } while (
> >> +                (s->queue_depth > 0) &&
> >> +                qemu_can_send_packet(&s->nc) &&
> >> +                ((size > 0) || bad_read)
> >> +            );
> >> +    }
> >> +    if (s->queue_depth >= (MAX_L2TPV3_MSGCNT - 1)) {
> >> +        /* We change the read poll flag only if our internal queue
> >> +         * (the message vector) is full. We do not change it on
> >> +         * size == 0 as other transports because we do our own buffering
> >> +         */
> >> +        l2tpv3_read_poll(s, false);
> > I don't see where l2tpv3_read_poll() is set to true again.  How do we
> > resume reading the socket?
> 
> in l2tpv3_poll which is the .poll in the info.

Who calls that?  It's only used for vhost-net as a way toggle QEMU file
descriptor monitoring when we pass the tap file descriptor to the
vhost_net driver (host kernel).

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]