[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing
From: |
Jason Wang |
Subject: |
Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic |
Date: |
Fri, 8 Apr 2016 16:31:05 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 |
On 04/08/2016 03:47 PM, Wei Xu wrote:
>
>
> On 2016年04月05日 10:47, Jason Wang wrote:
>>
>> On 04/04/2016 03:25 AM, address@hidden wrote:
>>> From: Wei Xu <address@hidden>
>>>
>>> All the data packets in a tcp connection will be cached to a big buffer
>>> in every receive interval, and will be sent out via a timer, the
>>> 'virtio_net_rsc_timeout' controls the interval, the value will
>>> influent the
>>> performance and response of tcp connection extremely, 50000(50us) is a
>>> experience value to gain a performance improvement, since the whql test
>>> sends packets every 100us, so '300000(300us)' can pass the test case,
>>> this is also the default value, it's gonna to be tunable.
>>>
>>> The timer will only be triggered if the packets pool is not empty,
>>> and it'll drain off all the cached packets
>>>
>>> 'NetRscChain' is used to save the segments of different protocols in a
>>> VirtIONet device.
>>>
>>> The main handler of TCP includes TCP window update, duplicated ACK
>>> check
>>> and the real data coalescing if the new segment passed sanity check
>>> and is identified as an 'wanted' one.
>>>
>>> An 'wanted' segment means:
>>> 1. Segment is within current window and the sequence is the expected
>>> one.
>>> 2. ACK of the segment is in the valid window.
>>> 3. If the ACK in the segment is a duplicated one, then it must less
>>> than 2,
>>> this is to notify upper layer TCP starting retransmission due to
>>> the spec.
>>>
>>> Sanity check includes:
>>> 1. Incorrect version in IP header
>>> 2. IP options & IP fragment
>>> 3. Not a TCP packets
>>> 4. Sanity size check to prevent buffer overflow attack.
>>>
>>> There maybe more cases should be considered such as ip
>>> identification other
>>> flags, while it broke the test because windows set it to the same
>>> even it's
>>> not a fragment.
>>>
>>> Normally it includes 2 typical ways to handle a TCP control flag,
>>> 'bypass'
>>> and 'finalize', 'bypass' means should be sent out directly, and
>>> 'finalize'
>>> means the packets should also be bypassed, and this should be done
>>> after searching for the same connection packets in the pool and sending
>>> all of them out, this is to avoid out of data.
>>>
>>> All the 'SYN' packets will be bypassed since this always begin a new'
>>> connection, other flags such 'FIN/RST' will trigger a finalization,
>>> because
>>> this normally happens upon a connection is going to be closed, an
>>> 'URG' packet
>>> also finalize current coalescing unit.
>>>
>>> Statistics can be used to monitor the basic coalescing status, the
>>> 'out of order'
>>> and 'out of window' means how many retransmitting packets, thus
>>> describe the
>>> performance intuitively.
>>>
>>> Signed-off-by: Wei Xu <address@hidden>
>>> ---
>>> hw/net/virtio-net.c | 480
>>> ++++++++++++++++++++++++++++++++++++++++-
>>> include/hw/virtio/virtio-net.h | 1 +
>>> include/hw/virtio/virtio.h | 72 +++++++
>>> 3 files changed, 552 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index bd91a4b..81e8e71 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -15,10 +15,12 @@
>>> #include "qemu/iov.h"
>>> #include "hw/virtio/virtio.h"
>>> #include "net/net.h"
>>> +#include "net/eth.h"
>>> #include "net/checksum.h"
>>> #include "net/tap.h"
>>> #include "qemu/error-report.h"
>>> #include "qemu/timer.h"
>>> +#include "qemu/sockets.h"
>>> #include "hw/virtio/virtio-net.h"
>>> #include "net/vhost_net.h"
>>> #include "hw/virtio/virtio-bus.h"
>>> @@ -38,6 +40,24 @@
>>> #define endof(container, field) \
>>> (offsetof(container, field) + sizeof(((container *)0)->field))
>>> +#define VIRTIO_NET_IP4_ADDR_SIZE 8 /* ipv4 saddr + daddr */
>>> +#define VIRTIO_NET_TCP_PORT_SIZE 4 /* sport + dport */
>>> +
>>> +/* IPv4 max payload, 16 bits in the header */
>>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
>>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
>>> +
>>> +/* header lenght value in ip header without option */
>> typo here.
> Thanks.
>>
>>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5
>>> +
>>> +/* Purge coalesced packets timer interval */
>>> +#define VIRTIO_NET_RSC_INTERVAL 300000
>>> +
>>> +/* This value affects the performance a lot, and should be tuned
>>> carefully,
>>> + '300000'(300us) is the recommended value to pass the WHQL test,
>>> '50000' can
>>> + gain 2x netperf throughput with tso/gso/gro 'off'. */
>>> +static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL;
>> Like we've discussed in previous versions, need we add another property
>> for this?
> Do you know how to make this a tunable parameter to guest? can this
> parameter be set via control queue?
It's possible I think.
[...]
>>> +
>>> +static void virtio_net_rsc_purge(void *opq)
>>> +{
>>> + NetRscChain *chain = (NetRscChain *)opq;
>>> + NetRscSeg *seg, *rn;
>>> +
>>> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
>>> + if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>>> + chain->stat.purge_failed++;
>>> + continue;
>> Is it better to break here, consider we fail to do the receive?
> Actually this fails only when receive fails according to the test, but
> should drain all the segments here,
> if break here, then will have to free all the buffers, is it possible
> a successful receiving followed by a failed one?
I'd say it's possible, e.g guest just finish rx buffer refills.
> if that does happens, i preferred give it a shot other than free it
> directly.
Ok.
[...]
>>> +
>>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain,
>>> NetClientState *nc,
>>> + const uint8_t *buf, size_t size)
>>> +{
>>> + uint16_t hdr_len;
>>> + NetRscSeg *seg;
>>> +
>>> + hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>>> + seg = g_malloc(sizeof(NetRscSeg));
>>> + seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
>>> + + sizeof(struct ip6_header) +
>>> VIRTIO_NET_MAX_TCP_PAYLOAD);
>> Why ip6_header here?
> ipv6 packets can carry 65535(2^16) bytes pure data payload, that means
> the header is not included.
> but ipv4 is included, i choose the maximum here.
Let's add a comment here, since the patch's title is to implement ipv4
coalescing.
[...]
>>> +
>>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain,
>>> NetRscSeg *seg,
>>> + const uint8_t *buf, size_t size, NetRscUnit
>>> *unit)
>> indentation is wrong here.
> Should it be aligned from parameter begin?
Looks like we'd better do this.
> i was going to save an extra line parameter.
[...]
Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic, Michael S. Tsirkin, 2016/04/12
[Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 tcp traffic, wexu, 2016/04/03
Re: [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL, Jason Wang, 2016/04/04