[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH/RFC 0/1] Vhost User Cross Cable: Intro
From: |
Marc-André Lureau |
Subject: |
Re: [PATCH/RFC 0/1] Vhost User Cross Cable: Intro |
Date: |
Fri, 10 Jan 2020 14:27:29 +0400 |
Hi
On Wed, Jan 8, 2020 at 5:57 AM V. <address@hidden> wrote:
>
> Hi List,
>
> For my VM setup I tend to use a lot of VM to VM single network links to do
> routing, switching and bridging in VM's instead of the host.
> Also stemming from a silly fetish to sometimes use some OpenBSD VMs as
> firewall, but that is besides the point here.
> I am using the standard, tested and true method of using a whole bunch of
> bridges, having 2 vhost taps each.
> This works and it's fast, but it is a nightmare to manage with all the
> interfaces on the host.
>
> So, I looked a bit into how I can improve this, basically coming down to "How
> to connect 2 VM's together in a really fast and easy way".
> This however, is not as straightforward as I thought, without going the whole
> route of OVS/Snabb/any other big feature bloated
> software switch.
> Cause really, all I want is to connect 2 VM's in a fast and easy way.
> Shouldn't be that hard right?
>
> Anyways, I end up finding tests/vhost-user-bridge.c, which is very nicely
> doing half of what I wanted.
> After some doubling of the vhosts and eliminating udp, I came up with a Vhost
> User Cross Cable. (patch in next post).
> It just opens 2 vhost sockets instead of 1 and does the forwarding between
> them.
> A terrible hack and slash of vhost-user-bridge.c, probably now with bugs
> causing the dead of many puppies and the end of humanity,
> but it works!
>
> However... I now am left with some questions, which I hope some of you can
> answer.
>
> 1.
> I looked, googled, read and tried things, but it is likely that I am an
> complete and utter moron and my google-fu has just been awful...
> Very likely... But is there really no other way then I have found to just
> link up 2 QEMU's in a fast non-bridge way? (No, not sockets.)
> Not that OVS and the likes are not fine software, but do we really need the
> whole DPDK to do this?
By "not sockets", you mean the data path should use shared memory?
Then, I don't think there are other way.
>
> 2.
> In the unlikely chance that I'm not an idiot, then I guess now we have a nice
> simple cross cable.
> However, I am still a complete vhost/virtio idiot who has no clue how it
> works and just randomly brute-forced code into submission.
> Maybe not entirely true, but I would still appreciate it very much if someone
> with more knowledge into vhost to have a quick look at
> how things are done in cc.
>
> Specifically this monstrosity in TX (speed_killer is a 1MB buffer and kills
> any speed):
> ret = iov_from_buf(sg, num, 0, speed_killer,
> iov_to_buf(out_sg, out_num, 0, speed_killer,
> MIN(iov_size(out_sg, out_num), sizeof
> speed_killer)
> )
> );
>
> vs. the commented:
> //ret = iov_copy(sg, num, out_sg, out_num, 0,
> // MIN(iov_size(sg, num), iov_size(out_sg, out_num)));
>
> The first is obviously a quick fix to get things working, however, in my
> meager understanding, should the 2nd one not work?
> Maybe I'm messing up my vectors here, or I am messing up my understanding of
> iov_copy, but shouldn't the 2nd form be the way to zero
> copy?
As you noted, the data must be copied from source to dest memory.
iov_copy() doesn't actually do that, I don't think we have a iov
function for that.
>
> 3.
> Now if Cross Cable is actually a new and (after a code-rewrite of 10) a
> viable way to connect 2 QEMU's together, could I actually
> suggest a better way?
> I am thinking of a '-netdev vhost-user-slave' option to connect (as client or
> server) to a master QEMU running '-netdev vhost-user'.
> This way there is no need for any external program at all, the master can
> have it's memory unshared and everything would just work
> and be fast.
> Also the whole thing can fall back to normal virtio if memory is not shared
> and it would even work in pure usermode without any
> context switch.
>
> Building a patch for this idea I could maybe get around to, don't clearly
> have an idea how much work this would be but I've done
> crazier things.
> But is this is something that someone might be able to whip up in an hour or
> two? Someone who actually does have a clue about vhost
> and virtio maybe? ;-)
I believe https://wiki.qemu.org/Features/VirtioVhostUser is what you
are after. It's still being discussed and non-trivial, but not very
active lately afaik.
>
> 4.
> Hacking together cc from bridge I noticed the use of container_of() to get
> from vudev to state in the vu callbacks.
> Would it be an idea to add a context pointer to the callbacks (possibly
> gotten from VuDevIface)?
> And I know. First post and I have the forwardness to even suggest an API
> change! I know!
> But it makes things a bit simpler to avoid globals and it makes sense to have
> some context in a callback to know what's going on,
> right? ;-)
Well, the callbacks are called with the VuDev, so container_of() is
quite fine since you can embed the device in your own structure. I
don't see a compelling reason to change that.
> 5.
> Last one, promise.
> I'm very much in the church of "less software == less bugs == less security
> problems".
> Running cc or a vhost-user-slave means QEMU has fast networking in usermode
> without the need for anything else then AF_UNIX + shared
> mem.
> So might it be possible to weed out any modern fancy stuff like the Internet
> Protocol, TCP, taps, bridges, ethernet and tokenring
> from a kernel and run QEMU on that?
> The idea here is a kernel with storage, a serial console, AF_UNIX and
> vfio-pci, only running QEMU.
> Would this be feasible? Or does QEMU need a kernel which at least has a grasp
> of understanding of what AF_INET and ethernet is?
> (Does a modern kernel even still support tokenring? No idea, Probably does.)
Sounds like it is possible.
> Finally, an example and some numbers.
>
> Compiling and starting the cross cable:
> ./configure
> make tests/vhost-user-cc
> tests/vhost-user-cc -l /tmp/left.sock -r /tmp/right.sock
>
> (Note, the cross cable will quit if one of the vm's quits, but the VM's will
> reconnect when cc starts again.)
>
> 2 VM's, host1 and host2, Linux guests, run like this:
>
> host1:
> /qemu/bin/qemu-system-x86_64 \
> -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp
> 8,cores=8 -m 2G -vga std \
> -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \
> -numa node,memdev=memory \
> -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host1,id=sda \
> -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \
> -nic
> tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:aa:aa:aa,br=br0
> \
> -chardev socket,id=left,path=/tmp/left.sock,reconnect=1 \
> -nic
> vhost-user,chardev=left,id=eth1,model=virtio-net-pci,mac=52:54:00:bb:bb:bb
>
> host2:
> /qemu/bin/qemu-system-x86_64 \
> -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp
> 8,cores=8 -m 2G -vga std \
> -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \
> -numa node,memdev=memory \
> -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host2,id=sda \
> -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \
> -nic
> tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:cc:cc:cc,br=br0
> \
> -chardev socket,id=right,path=/tmp/right.sock,reconnect=1 \
> -nic
> vhost-user,chardev=right,id=eth1,model=virtio-net-pci,mac=52:54:00:dd:dd:dd
>
>
> First, speed via eth0 (bridged tap with vhost, host2 runs './iperf3 -s'):
> root@host1:~/iperf-3.1.3/src# ./iperf3 -c 192.168.0.2 -i 1 -t 10
> ...
> [ 4] 0.00-10.00 sec 10.7 GBytes 9.22 Gbits/sec
> receiver
>
> Second, speed via eth1 (Vhost Cross Cable):
> root@host1:~/iperf-3.1.3/src# ./iperf3 -c 192.168.1.2 -i 1 -t 10
> ...
> [ 4] 0.00-10.00 sec 2.05 GBytes 1.76 Gbits/sec
> receiver
>
> So, a factor of 6 slowdown against bridge. Not too bad, considering the bad
> iovec mem-copying I do.
> Lots of room for improvement though, but at least for me it's also 5 times
> faster as socket.
>
And what performance do you get with -netdev socket ?
--
Marc-André Lureau