[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
From: |
Toke Høiland-Jørgensen |
Subject: |
Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net |
Date: |
Fri, 04 Dec 2020 14:57:22 +0100 |
Yuri Benditovich <yuri.benditovich@daynix.com> writes:
> On Fri, Dec 4, 2020 at 12:09 PM Toke Høiland-Jørgensen <toke@redhat.com>
> wrote:
>
>> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
>>
>> > On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <toke@redhat.com>
>> > wrote:
>> >
>> >> Jason Wang <jasowang@redhat.com> writes:
>> >>
>> >> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
>> >> >> This set of patches introduces the usage of eBPF for packet steering
>> >> >> and RSS hash calculation:
>> >> >> * RSS(Receive Side Scaling) is used to distribute network packets to
>> >> >> guest virtqueues by calculating packet hash
>> >> >> * Additionally adding support for the usage of RSS with vhost
>> >> >>
>> >> >> The eBPF works on kernels 5.8+
>> >> >> On earlier kerneld it fails to load and the RSS feature is reported
>> >> >> only without vhost and implemented in 'in-qemu' software.
>> >> >>
>> >> >> Implementation notes:
>> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>> >> >> Added libbpf dependency and eBPF support.
>> >> >> The eBPF program is part of the qemu and presented as an array
>> >> >> of BPF ELF file data.
>> >> >> The compilation of eBPF is not part of QEMU build and can be done
>> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
>> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used.
>> >> >> 'in-qemu' RSS used in the case of hash population and as a fallback
>> >> option.
>> >> >> For vhost, the hash population feature is not reported to the guest.
>> >> >>
>> >> >> Please also see the documentation in PATCH 5/5.
>> >> >>
>> >> >> I am sending those patches as RFC to initiate the discussions and get
>> >> >> feedback on the following points:
>> >> >> * Fallback when eBPF is not supported by the kernel
>> >> >> * Live migration to the kernel that doesn't have eBPF support
>> >> >> * Integration with current QEMU build
>> >> >> * Additional usage for eBPF for packet filtering
>> >> >>
>> >> >> Known issues:
>> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
>> >> >> as a fallback, also, hash population feature is not reported to
>> guests
>> >> >> with vhost.
>> >> >> * big-endian BPF support: for now, eBPF isn't supported on
>> >> >> big-endian systems. Can be added in future if required.
>> >> >> * huge .h file with eBPF binary. The size of .h file containing
>> >> >> eBPF binary is currently ~5K lines, because the binary is built with
>> >> debug information.
>> >> >> The binary without debug/BTF info can't be loaded by libbpf.
>> >> >> We're looking for possibilities to reduce the size of the .h files.
>> >> >
>> >> >
>> >> > Adding Toke for sharing more idea from eBPF side.
>> >> >
>> >> > We had some discussion on the eBPF issues:
>> >> >
>> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use libbpf
>> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't access
>> any
>> >> > skb metadata, BTF is not strictly required for CO-RE. But it might
>> still
>> >> > useful for e.g debugging.
>> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that
>> we
>> >> > can strip debug symbols, but Yuri found some sections can't be
>> stripped,
>> >> > we can keep discussing here.
>> >>
>> >> I just tried simply running 'strip' on a sample trivial XDP program,
>> >> which brought its size down from ~5k to ~1k and preserved the BTF
>> >> information without me having to do anything.
>> >>
>> >
>> > With our eBPF code the numbers are slightly different:
>> > The code size without BTF: 7.5K (built without '-g')
>> > Built with '-g': 45K
>> > Stripped: 19K
>> > The difference between 7.5 and 19K still seems significant, especially
>> when
>> > we do not use any kernel structures and do not need these BTF sections
>>
>> That does seem like a lot of BTF information. Did you confirm (with
>> objdump) that it's the .BTF* sections that take up these extra 12k? Do
>> you have some really complicated data structures in the file or
>> something? Got a link to the source somewhere that isn't a web mailing
>> list archive? :)
>>
>>
> Looks like the extra size is related to BTF: there are 4 BTF sections that
> take 12.5K
> [ 7] .BTF PROGBITS 0000000000000000 00144c 00175d 00
> 0 0 1
> [ 8] .rel.BTF REL 0000000000000000 002bb0 000040 10
> 14 7 8
> [ 9] .BTF.ext PROGBITS 0000000000000000 002bf0 000cd0 00
> 0 0 1
> [10] .rel.BTF.ext REL 0000000000000000 0038c0 000ca0 10
> 14 9 8
Right, okay, that does not look completely outrageous with the amount of
code and type information you have in that file.
> All the sources are at:
> The branch without libbpf
> https://github.com/daynix/qemu/tree/eBPF_RFC
> The branch with libbpf
> https://github.com/daynix/qemu/tree/eBPF_RFCv2
>
> all the eBPF-related code is under qemu/ebpf directory.
Ah, cool, thanks!
>> In any case, while I do think it smells a little of premature
>> optimisation, you can of course strip the BTF information until you need
>> it. Having it around makes debugging easier (bpftool will expand your
>> map structures for you when dumping maps, and that sort of thing), but
>> it's not really essential if you don't need CO-RE.
>>
>> > This is only reason to prefer non-libbpf option for this specific eBPF
>>
>> You can still use libbpf without BTF. It's using BTF without libbpf that
>> tends to not work so well...
>>
>>
> If we build the eBPF without '-g' or strip the BTF information out of the
> object file the libbpf crashes right after issuing printout "libbpf: BTF is
> required, but is missing or corrupted".
> We did not investigate this too deeply but on the first glance it looks
> like the presence of maps automatically makes the libbpf to require BTF.
Ah, right. Well, you're using the BTF-based map definition syntax. So
yeah, that does require BTF: The __uint() and __type() macros really
expand to type definitions that are specifically crafted to be embedded
as BTF in the file.
You could use the old-style map definitions that don't use BTF[0], but
BTF is really where things are going in BPF-land so I think longer term
you'll probably end up needing it anyway. So going to this much trouble
just to save 10k on binary size seems to me like it's a decision you'll
end up regretting :)
[0]
https://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-counter/xdp_prog_kern.c#L11
-Toke