[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v6] net: L2TPv3 transport
From: |
Anton Ivanov |
Subject: |
Re: [Qemu-devel] [PATCH v6] net: L2TPv3 transport |
Date: |
Thu, 27 Mar 2014 19:45:33 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12 |
On 27/03/14 18:25, Stefan Hajnoczi wrote:
> On Thu, Mar 27, 2014 at 4:53 PM, Anton Ivanov (antivano)
> <address@hidden> wrote:
>> On 27/03/14 12:30, Stefan Hajnoczi wrote:
>>> On Wed, Mar 26, 2014 at 11:08:41AM +0000, address@hidden wrote:
>>>> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc,
>>>> + const struct iovec *iov,
>>>> + int iovcnt)
>>>> +{
>>>> + NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>>>> +
>>>> + struct msghdr message;
>>>> + int ret;
>>>> +
>>>> + if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>>>> + error_report(
>>>> + "iovec too long %d > %d, change l2tpv3.h",
>>>> + iovcnt, MAX_L2TPV3_IOVCNT
>>>> + );
>>>> + return -1;
>>>> + }
>>> The alternative is to linearize the buffer using iov_to_buf() and call
>>> net_l2tpv3_receive_dgram(). Something like:
>>>
>>> if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>>> size_t size = iov_size(iov, iovcnt);
>>> uint8_t *buf;
>>>
>>> buf = g_malloc(size);
>>> iov_to_buf(iov, iovcnt, 0, buf, size);
>>> ret = net_l2tpv3_receive_dgram(nc, buf, size);
>>> g_free(buf);
>>> return ret;
>>> }
>> iov_to_buf is a memcpy of the data and a malloc down the call chain.
>> That is quite a significant cost compared to just shifting the iov a bit
>> to add an element for header. I tried that early on and it was
>> introducing a noticeable penalty.
>>
>> If I am to merge it I would actually merge it the other way around -
>> express both net_l2tpv3_receive_dgram and net_l2tpv3_receive_dgram_iov
>> via an iov based backend. That will allow to add sendmmsg one day as the
>> data structures will be the same in all cases.
> I agree with you that the iov code path should be used. The context
> for my comment was "if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {". I'm just
> suggesting that we fall back to linearizing the iovector if it has too
> many elements, instead of erroring out.
Understood. Will do.
>
>>>> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf)
>>>> +{
>>>> +
>>>> + uint32_t *session;
>>>> + uint64_t cookie;
>>>> +
>>>> + if ((!s->udp) && (!s->ipv6)) {
>>>> + buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
>>>> + }
>>>> +
>>>> + /* we do not do a strict check for "data" packets as per
>>>> + * the RFC spec because the pure IP spec does not have
>>>> + * that anyway.
>>>> + */
>>>> +
>>>> + if (s->cookie) {
>>>> + if (s->cookie_is_64) {
>>>> + cookie = ldq_be_p(buf + s->cookie_offset);
>>>> + } else {
>>>> + cookie = ldl_be_p(buf + s->cookie_offset);
>>>> + }
>>>> + if (cookie != s->rx_cookie) {
>>>> + if (!s->header_mismatch) {
>>>> + error_report("unknown cookie id\n");
>>> error_report() messages should not have a newline ('\n') at the end.
>>>
>>>> + }
>>>> + return -1;
>>>> + }
>>>> + }
>>>> + session = (uint32_t *) (buf + s->session_offset);
>>>> + if (ldl_be_p(session) != s->rx_session) {
>>>> + if (!s->header_mismatch) {
>>>> + error_report("session mismatch");
>>>> + }
>>>> + return -1;
>>>> + }
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void net_l2tpv3_process_queue(NetL2TPV3State *s)
>>>> +{
>>>> + int size = 0;
>>>> + struct iovec *vec;
>>>> + bool bad_read;
>>>> + int data_size;
>>>> + struct mmsghdr *msgvec;
>>>> +
>>>> + /* go into ring mode only if there is a "pending" tail */
>>>> + if (s->queue_depth > 0 && (!s->delivering)) {
>>>> + do {
>>>> + msgvec = s->msgvec + s->queue_tail;
>>>> + if (msgvec->msg_len > 0) {
>>>> + data_size = msgvec->msg_len - s->header_size;
>>> header_size is never assigned to.
>> It is - in init. It is used to divorce the "read header size" from
>> "l2tpv3 header size" as needed by the ipv4 socket API. Using it allows
>> to remove the check if we are reading from ipv4 raw sockets in most (but
>> not all) places.
> I'm not seeing that in this version of the patch, please check again
> and let me know which line number it is on.
You are correct.
I failed to cherry-pick that change correctly from my development branch
when creating the patchset for submission.
Mea culpa.
>
>>>> + vec = msgvec->msg_hdr.msg_iov;
>>>> + if ((data_size > 0) &&
>>>> + (l2tpv3_verify_header(s, vec->iov_base) == 0)) {
>>>> + vec++;
>>>> + /* we do not use the "normal" send and send async
>>>> + * functions here because we have our own buffer -
>>>> + * the message vector. If we use the "usual" ones
>>>> + * we will end up double-buffering.
>>>> + */
>>>> + s->delivering = true;
>>> What is s->delivering guarding against? This function is only called
>>> from the s->fd read handler function, so I don't see a reentrant code
>>> path.
>> Cut-n-paste from the queue functions without doing full analysis of what
>> is it guarding against there.
> Okay, the problem in the generic net code is that some net clients
> (like slirp) will send packets back to their peer from their receive()
> function. For example, imagine a DHCP request packet from the guest
> being sent to the slirp net client. It will send the DHCP reply right
> away while we're still in the send and receive functions. This
> creates reentrancy and not all code handles it so we just queue the
> packets instead.
>
> But in this case there is no reentrancy since only the fd handler
> function can call net_l2tpv3_process_queue().
OK.
>
>>>> + /* deliver directly to peer bypassing queueing and
>>>> + * buffering
>>>> + */
>>>> + size = qemu_deliver_packet(
>>>> + &s->nc,
>>>> + QEMU_NET_PACKET_FLAG_NONE,
>>>> + vec->iov_base,
>>>> + data_size,
>>>> + s->nc.peer
>>>> + );
>>> There is no callback when the peer re-enables receive. In other words,
>>> packets will be stuck in s->msgvec until a new packet arrives and
>>> net_l2tpv3_send() is invoked.
>>>
>>> I think you need to modify qemu_flush_queued_packets() to invoke a new
>>> callback. You could convert the existing code:
>> I had the same thoughts, just did not want to touch it just yet.
>>
>> Something along the lines would have been necessary for sendmmsg too as
>> there you would want to work off a pre-configured queue in the first place.
>>
>>> if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_HUBPORT) {
>>> if (net_hub_flush(nc->peer)) {
>>> qemu_notify_event();
>>> }
>>> }
>>>
>>> to:
>>>
>>> if (nc->peer && nc->peer->info->peer_flushed) {
>>> if (nc->peer->info->peer_flushed(nc->peer)) {
>>> qemu_notify_event();
>>> }
>>> }
>>>
>>> and move the hub flushing code into net/hub.c.
>>>
>>> Then you could also implement ->peer_flushed() in l2tpv3.c so that it
>>> calls net_l2tpv3_process_queue() and returns true if packets were
>>> delivered.
>>>
>>> Please introduce ->peer_flushed() in a separate patch.
>> Understood.
>>
>>> Or you could keep it simple and go back to using qemu_send_packet()
>>> without trying to avoid duplicate buffering. (You can always send
>>> patches later that eliminate the buffering.)
>> I would probably go back to that for submission so we can do it in stages.
>>
>> I will branch out the zero copy work and relevant additions to queue.c
>> for submission at a later date along with zero copy versions of gre,
>> raw, as well as a revised version of l2tpv3.c
> Excellent, I was hoping for this, thanks!
OK. Will do it this way.
ETA - Monday.
A.