[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH/RFC 0/1] Vhost User Cross Cable: Intro

From: Marc-André Lureau
Subject: Re: [PATCH/RFC 0/1] Vhost User Cross Cable: Intro
Date: Fri, 10 Jan 2020 14:27:29 +0400


On Wed, Jan 8, 2020 at 5:57 AM V. <address@hidden> wrote:
> Hi List,
> For my VM setup I tend to use a lot of VM to VM single network links to do 
> routing, switching and bridging in VM's instead of the host.
> Also stemming from a silly fetish to sometimes use some OpenBSD VMs as 
> firewall, but that is besides the point here.
> I am using the standard, tested and true method of using a whole bunch  of 
> bridges, having 2 vhost taps each.
> This works and it's fast, but it is a nightmare to manage with all the 
> interfaces on the host.
> So, I looked a bit into how I can improve this, basically coming down to "How 
> to connect 2 VM's together in a really fast and easy way".
> This however, is not as straightforward as I thought, without going the whole 
> route of OVS/Snabb/any other big feature bloated
> software switch.
> Cause really, all I want is to connect 2 VM's in a fast and easy way. 
> Shouldn't be that hard right?
> Anyways, I end up finding tests/vhost-user-bridge.c, which is very nicely 
> doing half of what I wanted.
> After some doubling of the vhosts and eliminating udp, I came up with a Vhost 
> User Cross Cable. (patch in next post).
> It just opens 2 vhost sockets instead of 1 and does the forwarding between 
> them.
> A terrible hack and slash of vhost-user-bridge.c, probably now with bugs 
> causing the dead of many puppies and the end of humanity,
> but it works!
> However... I now am left with some questions, which I hope some of you can 
> answer.
> 1.
> I looked, googled, read and tried things, but it is likely that I am an 
> complete and utter moron and my google-fu has just been awful...
> Very likely... But is there really no other way then I have found to just 
> link up 2 QEMU's in a fast non-bridge way? (No, not sockets.)
> Not that OVS and the likes are not fine software, but do we really need the 
> whole DPDK to do this?

By "not sockets", you mean the data path should use shared memory?
Then, I don't think there are other way.

> 2.
> In the unlikely chance that I'm not an idiot, then I guess now we have a nice 
> simple cross cable.
> However, I am still a complete vhost/virtio idiot who has no clue how it 
> works and just randomly brute-forced code into submission.
> Maybe not entirely true, but I would still appreciate it very much if someone 
> with more knowledge into vhost to have a quick look at
> how things are done in cc.
> Specifically this monstrosity in TX (speed_killer is a 1MB buffer and kills 
> any speed):
>   ret = iov_from_buf(sg, num, 0, speed_killer,
>                      iov_to_buf(out_sg, out_num, 0, speed_killer,
>                                 MIN(iov_size(out_sg, out_num), sizeof 
> speed_killer)
>                                )
>                     );
>   vs. the commented:
>   //ret = iov_copy(sg, num, out_sg, out_num, 0,
>   //               MIN(iov_size(sg, num), iov_size(out_sg, out_num)));
> The first is obviously a quick fix to get things working, however, in my 
> meager understanding, should the 2nd one not work?
> Maybe I'm messing up my vectors here, or I am messing up my understanding of 
> iov_copy, but shouldn't the 2nd form be the way to zero
> copy?

As you noted, the data must be copied from source to dest memory.
iov_copy() doesn't actually do that, I don't think we have a iov
function for that.

> 3.
> Now if Cross Cable is actually a new and (after a code-rewrite of 10) a 
> viable way to connect 2 QEMU's together, could I actually
> suggest a better way?
> I am thinking of a '-netdev vhost-user-slave' option to connect (as client or 
> server) to a master QEMU running '-netdev vhost-user'.
> This way there is no need for any external program at all, the master can 
> have it's memory unshared and everything would just work
> and be fast.
> Also the whole thing can fall back to normal virtio if memory is not shared 
> and it would even work in pure usermode without any
> context switch.
> Building a patch for this idea I could maybe get around to, don't clearly 
> have an idea how much work this would be but I've done
> crazier things.
> But is this is something that someone might be able to whip up in an hour or 
> two? Someone who actually does have a clue about vhost
> and virtio maybe? ;-)

I believe https://wiki.qemu.org/Features/VirtioVhostUser is what you
are after. It's still being discussed and non-trivial, but not very
active lately afaik.

> 4.
> Hacking together cc from bridge I noticed the use of container_of() to get 
> from vudev to state in the vu callbacks.
> Would it be an idea to add a context pointer to the callbacks (possibly 
> gotten from VuDevIface)?
> And I know. First post and I have the forwardness to even suggest an API 
> change! I know!
> But it makes things a bit simpler to avoid globals and it makes sense to have 
> some context in a callback to know what's going on,
> right? ;-)

Well, the callbacks are called with the VuDev, so container_of() is
quite fine since you can embed the device in your own structure. I
don't see a compelling reason to change that.

> 5.
> Last one, promise.
> I'm very much in the church of "less software == less bugs == less security 
> problems".
> Running cc or a vhost-user-slave means QEMU has fast networking in usermode 
> without the need for anything else then AF_UNIX + shared
> mem.
> So might it be possible to weed out any modern fancy stuff like the Internet 
> Protocol, TCP, taps, bridges, ethernet and tokenring
> from a kernel and run QEMU on that?
> The idea here is a kernel with storage, a serial console, AF_UNIX and 
> vfio-pci, only running QEMU.
> Would this be feasible? Or does QEMU need a kernel which at least has a grasp 
> of understanding of what AF_INET and ethernet is?
> (Does a modern kernel even still support tokenring? No idea, Probably does.)

Sounds like it is possible.

> Finally, an example and some numbers.
> Compiling and starting the cross cable:
> ./configure
> make tests/vhost-user-cc
> tests/vhost-user-cc -l /tmp/left.sock -r /tmp/right.sock
> (Note, the cross cable will quit if one of the vm's quits, but the VM's will 
> reconnect when cc starts again.)
> 2 VM's, host1 and host2, Linux guests, run like this:
> host1:
> /qemu/bin/qemu-system-x86_64 \
>   -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp 
> 8,cores=8 -m 2G -vga std \
>   -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \
>   -numa node,memdev=memory \
>   -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host1,id=sda \
>   -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \
>   -nic 
> tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:aa:aa:aa,br=br0
>  \
>   -chardev socket,id=left,path=/tmp/left.sock,reconnect=1 \
>   -nic 
> vhost-user,chardev=left,id=eth1,model=virtio-net-pci,mac=52:54:00:bb:bb:bb
> host2:
> /qemu/bin/qemu-system-x86_64 \
>   -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp 
> 8,cores=8 -m 2G -vga std \
>   -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \
>   -numa node,memdev=memory \
>   -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host2,id=sda \
>   -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \
>   -nic 
> tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:cc:cc:cc,br=br0
>  \
>   -chardev socket,id=right,path=/tmp/right.sock,reconnect=1 \
>   -nic 
> vhost-user,chardev=right,id=eth1,model=virtio-net-pci,mac=52:54:00:dd:dd:dd
> First, speed via eth0 (bridged tap with vhost, host2 runs './iperf3 -s'):
>   root@host1:~/iperf-3.1.3/src# ./iperf3 -c -i 1 -t 10
>   ...
>   [  4]   0.00-10.00  sec  10.7 GBytes  9.22 Gbits/sec                  
> receiver
> Second, speed via eth1 (Vhost Cross Cable):
>   root@host1:~/iperf-3.1.3/src# ./iperf3 -c -i 1 -t 10
>   ...
>   [  4]   0.00-10.00  sec  2.05 GBytes  1.76 Gbits/sec                  
> receiver
> So, a factor of 6 slowdown against bridge. Not too bad, considering the bad 
> iovec mem-copying I do.
> Lots of room for improvement though, but at least for me it's also 5 times 
> faster as socket.

And what performance do you get with -netdev socket ?

Marc-André Lureau

reply via email to

[Prev in Thread] Current Thread [Next in Thread]