qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing


From: Jason Wang
Subject: Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic
Date: Fri, 8 Apr 2016 16:31:05 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0


On 04/08/2016 03:47 PM, Wei Xu wrote:
>
>
> On 2016年04月05日 10:47, Jason Wang wrote:
>>
>> On 04/04/2016 03:25 AM, address@hidden wrote:
>>> From: Wei Xu <address@hidden>
>>>
>>> All the data packets in a tcp connection will be cached to a big buffer
>>> in every receive interval, and will be sent out via a timer, the
>>> 'virtio_net_rsc_timeout' controls the interval, the value will
>>> influent the
>>> performance and response of tcp connection extremely, 50000(50us) is a
>>> experience value to gain a performance improvement, since the whql test
>>> sends packets every 100us, so '300000(300us)' can pass the test case,
>>> this is also the default value, it's gonna to be tunable.
>>>
>>> The timer will only be triggered if the packets pool is not empty,
>>> and it'll drain off all the cached packets
>>>
>>> 'NetRscChain' is used to save the segments of different protocols in a
>>> VirtIONet device.
>>>
>>> The main handler of TCP includes TCP window update, duplicated ACK
>>> check
>>> and the real data coalescing if the new segment passed sanity check
>>> and is identified as an 'wanted' one.
>>>
>>> An 'wanted' segment means:
>>> 1. Segment is within current window and the sequence is the expected
>>> one.
>>> 2. ACK of the segment is in the valid window.
>>> 3. If the ACK in the segment is a duplicated one, then it must less
>>> than 2,
>>>     this is to notify upper layer TCP starting retransmission due to
>>> the spec.
>>>
>>> Sanity check includes:
>>> 1. Incorrect version in IP header
>>> 2. IP options & IP fragment
>>> 3. Not a TCP packets
>>> 4. Sanity size check to prevent buffer overflow attack.
>>>
>>> There maybe more cases should be considered such as ip
>>> identification other
>>> flags, while it broke the test because windows set it to the same
>>> even it's
>>> not a fragment.
>>>
>>> Normally it includes 2 typical ways to handle a TCP control flag,
>>> 'bypass'
>>> and 'finalize', 'bypass' means should be sent out directly, and
>>> 'finalize'
>>> means the packets should also be bypassed, and this should be done
>>> after searching for the same connection packets in the pool and sending
>>> all of them out, this is to avoid out of data.
>>>
>>> All the 'SYN' packets will be bypassed since this always begin a new'
>>> connection, other flags such 'FIN/RST' will trigger a finalization,
>>> because
>>> this normally happens upon a connection is going to be closed, an
>>> 'URG' packet
>>> also finalize current coalescing unit.
>>>
>>> Statistics can be used to monitor the basic coalescing status, the
>>> 'out of order'
>>> and 'out of window' means how many retransmitting packets, thus
>>> describe the
>>> performance intuitively.
>>>
>>> Signed-off-by: Wei Xu <address@hidden>
>>> ---
>>>   hw/net/virtio-net.c            | 480
>>> ++++++++++++++++++++++++++++++++++++++++-
>>>   include/hw/virtio/virtio-net.h |   1 +
>>>   include/hw/virtio/virtio.h     |  72 +++++++
>>>   3 files changed, 552 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index bd91a4b..81e8e71 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -15,10 +15,12 @@
>>>   #include "qemu/iov.h"
>>>   #include "hw/virtio/virtio.h"
>>>   #include "net/net.h"
>>> +#include "net/eth.h"
>>>   #include "net/checksum.h"
>>>   #include "net/tap.h"
>>>   #include "qemu/error-report.h"
>>>   #include "qemu/timer.h"
>>> +#include "qemu/sockets.h"
>>>   #include "hw/virtio/virtio-net.h"
>>>   #include "net/vhost_net.h"
>>>   #include "hw/virtio/virtio-bus.h"
>>> @@ -38,6 +40,24 @@
>>>   #define endof(container, field) \
>>>       (offsetof(container, field) + sizeof(((container *)0)->field))
>>>   +#define VIRTIO_NET_IP4_ADDR_SIZE   8        /* ipv4 saddr + daddr */
>>> +#define VIRTIO_NET_TCP_PORT_SIZE   4        /* sport + dport */
>>> +
>>> +/* IPv4 max payload, 16 bits in the header */
>>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
>>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
>>> +
>>> +/* header lenght value in ip header without option */
>> typo here.
> Thanks.
>>
>>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5
>>> +
>>> +/* Purge coalesced packets timer interval */
>>> +#define VIRTIO_NET_RSC_INTERVAL  300000
>>> +
>>> +/* This value affects the performance a lot, and should be tuned
>>> carefully,
>>> +   '300000'(300us) is the recommended value to pass the WHQL test,
>>> '50000' can
>>> +   gain 2x netperf throughput with tso/gso/gro 'off'. */
>>> +static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL;
>> Like we've discussed in previous versions, need we add another property
>> for this?
> Do you know how to make this a tunable parameter to guest? can this
> parameter be set via control queue?

It's possible I think.


[...]

>>> +
>>> +static void virtio_net_rsc_purge(void *opq)
>>> +{
>>> +    NetRscChain *chain = (NetRscChain *)opq;
>>> +    NetRscSeg *seg, *rn;
>>> +
>>> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
>>> +        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>>> +            chain->stat.purge_failed++;
>>> +            continue;
>> Is it better to break here, consider we fail to do the receive?
> Actually this fails only when receive fails according to the test, but
> should drain all the segments here,
> if break here, then will have to free all the buffers, is it possible
> a successful receiving followed by a failed one?

I'd say it's possible, e.g guest just finish rx buffer refills.

> if that does happens, i preferred give it a shot other than free it
> directly.

Ok.

[...]

>>> +
>>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain,
>>> NetClientState *nc,
>>> +                                     const uint8_t *buf, size_t size)
>>> +{
>>> +    uint16_t hdr_len;
>>> +    NetRscSeg *seg;
>>> +
>>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>>> +    seg = g_malloc(sizeof(NetRscSeg));
>>> +    seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
>>> +                   + sizeof(struct ip6_header) +
>>> VIRTIO_NET_MAX_TCP_PAYLOAD);
>> Why ip6_header here?
> ipv6 packets can carry 65535(2^16) bytes pure data payload, that means
> the header is not included.
> but ipv4 is included, i choose the maximum here.

Let's add a comment here, since the patch's title is to implement ipv4
coalescing.

[...]

>>> +
>>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain,
>>> NetRscSeg *seg,
>>> +                        const uint8_t *buf, size_t size, NetRscUnit
>>> *unit)
>> indentation is wrong here.
> Should it be aligned from parameter begin?

Looks like we'd better do this.

> i was going to save an extra line parameter.

[...]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]