qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/5] qmp: Added the helper stamp check.


From: Daniel P . Berrangé
Subject: Re: [PATCH 3/5] qmp: Added the helper stamp check.
Date: Tue, 28 Feb 2023 19:01:52 +0000
User-agent: Mutt/2.2.9 (2022-11-12)

On Tue, Feb 28, 2023 at 06:04:51PM +0000, Daniel P. Berrangé wrote:
> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com>
> > wrote:
> > 
> > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > > Added a function to check the stamp in the helper.
> > > > eBPF helper should have a special symbol that generates during the 
> > > > build.
> > > > QEMU checks the helper and determines that it fits, so the helper
> > > > will produce proper output.
> > >
> > > I think this is quite limiting for in place upgrades.
> > >
> > > Consider this scenario
> > >
> > >  * Host has QEMU 8.1.0 installed
> > >  * VM is running QEMU 8.1.0
> > >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> > >  * Host is upgraded to QEMU 8.1.1
> > >  * User attempts to hotplug a NIC to the running VM
> > >
> > > IIUC this last step is going to fail because we'll be loading
> > > the EBF program from 8.1.1 and so its hash is different from
> > > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > > running.
> > >
> > >   Indeed we did not take in account the in-place upgrade.
> > 
> > 
> > 
> > > If some changes to the EBF program are not going to be back
> > > compatible from the POV of the QEMU process, should we instead
> > > be versioning the EBF program. eg so new QEMU will ship both
> > > the old and new versions of the EBF program.
> > 
> > This does not seem to be an elegant option: QEMU theoretically can include
> > different eBPF programs but it hardly can interface with each one of them.
> > The code of QEMU (access to eBPF maps etc) includes header files which eBPF
> > of the day is being built with them.
> > 
> > I see 2 options to address this issue (of course there are more)
> > 1. Build and install qemu-rss-helper-<hash> executable. Libvirt will always
> > have a correct name, so for the running instance it will use
> > qemu-rss-helper-<old-hash>, for the new instance it will use
> > qemu-rss-helper-<new-hash>
> 
> We'll get an ever growing number of program variants we need to
> build & distribute with each new QEMU release.
> 
> > 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> > will always retrieve the executable to the temporary file name and use it.
> > So the retrieved helper will always be compatible with QEMU. I'm not sure
> > what is the most portable way to do that.
> 
> QEMU is considered an untrusted process, so there's no way we're going
> to ask it to give us an ELF binary and then execute that in privileged
> context.
> 
> > Does one of these seem suitable?
> 
> Neither feels very appealing to me.
> 
> I've been trying to understand the eBPF code we're dealing with in a
> little more detail.
> 
> IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> for the BPF program, and one or more FDs for the BPF maps that the
> program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> the program FD.
> 
> The helper program that is proposed here calls ebpf_rss_load() to
> load the program and get back a struct which gives access to the
> 4 FDs, which are then sent to the mgmt app, which forwards them
> onto QEMU.
> 
> The ebpf_rss_load() method is making use of various structs that
> are specific to the RSS program implementation, but does not seems
> to do anything especially interesting.  It calls into rss_bpf__open()
> which eventually gets around to calling rss_bpf__create_skeleton
> which is where the interesting stuff happens.
> 
> This rss_bpf__create_skeleton() method is implemented in terms of
> totally generic libbpf APIs, and has the actual blob that is the
> BPF program.
> 
> Looking at what this does, I feel it should be trivial for a mgmt
> app to implement equivalent logic to rss_bpf__create_skeleton in a
> generic manner, if we could just expose the program blob and the
> map names to the mgmt app. eg a simple json file
> 
>   {
>      "maps": [
>         "tap_rss_map_configurations",
>       "tap_rss_map_indirection_table",
>       "tap_rss_map_toeplitz_key",
>      ],
>      "program": "....the big blob encoded in base64..."
>   }
> 
> if we installed that file are /usr/share/qemu/bpf/net-rss.json
> then when a QEMU process is started, the mgmt app capture the
> data in this JSON file. It now has enough info to create the
> EBPF programs needed and pass the FDs over to QEMU. This would
> be robust against QEMU software upgrades, and not tied to the
> specific EBPF program imlp. We can add or remove maps / change
> their names etc any time, as the details in the JSON descriptor
> can be updated.  This avoids need for any special helper program
> to be provided by QEMU with the problems that is throwing up
> for us.

It occurrs to me that exposing the BPF program as data rather than
via binary will make more practical to integrate this into KubeVirt's
architecture. In their deployment setup both QEMU and libvirt are
running unprivileged inside a container. For any advanced nmetworking
a completely separate component creates the TAP device and passes it
into the container running QEMU. I don't think that the separate
precisely matched helper binary would be something they can use, but
it might be possible to expose a data file providing the BPF program
blob and describing its maps.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]