[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v7] net: L2TPv3 transport
From: |
Benoît Canet |
Subject: |
Re: [Qemu-devel] [PATCH v7] net: L2TPv3 transport |
Date: |
Thu, 3 Apr 2014 14:27:06 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
The Monday 31 Mar 2014 à 15:39:19 (+0100), address@hidden wrote :
> From: Anton Ivanov <address@hidden>
>
> This transport allows to connect a QEMU nic to a static Ethernet
> over L2TPv3 tunnel. The transport supports all options present
> in the Linux kernel implementation. It allows QEMU to connect
> to any Linux host running kernel 3.3+, most routers and network
> devices as well as other QEMU instances.
>
> Signed-off-by: Anton Ivanov <address@hidden>
> ---
>
> Addressed in this release:
>
> 1. Back to qemu send_packet instead of direct delivery from the
> recvmmsg ring. The zero copy off driver rx ring will be reintroduced
> in a later patch
>
> 2. Fixed mismerge of header size handling from our tree
>
> 3. Fixed formatting
Sorry it does apply on master
I was misleaded by an error message of git apply:
Best regards
Benoît
>
>
> net/Makefile.objs | 1 +
> net/clients.h | 2 +
> net/l2tpv3.c | 745
> +++++++++++++++++++++++++++++++++++++++++++++++++++++
> net/net.c | 3 +
> qapi-schema.json | 60 +++++
> qemu-options.hx | 82 ++++++
> 6 files changed, 893 insertions(+)
> create mode 100644 net/l2tpv3.c
>
> diff --git a/net/Makefile.objs b/net/Makefile.objs
> index 4854a14..160214e 100644
> --- a/net/Makefile.objs
> +++ b/net/Makefile.objs
> @@ -2,6 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
> common-obj-y += socket.o
> common-obj-y += dump.o
> common-obj-y += eth.o
> +common-obj-$(CONFIG_LINUX) += l2tpv3.o
> common-obj-$(CONFIG_POSIX) += tap.o
> common-obj-$(CONFIG_LINUX) += tap-linux.o
> common-obj-$(CONFIG_WIN32) += tap-win32.o
> diff --git a/net/clients.h b/net/clients.h
> index 7793294..bbf177c 100644
> --- a/net/clients.h
> +++ b/net/clients.h
> @@ -47,6 +47,8 @@ int net_init_tap(const NetClientOptions *opts, const char
> *name,
> int net_init_bridge(const NetClientOptions *opts, const char *name,
> NetClientState *peer);
>
> +int net_init_l2tpv3(const NetClientOptions *opts, const char *name,
> + NetClientState *peer);
> #ifdef CONFIG_VDE
> int net_init_vde(const NetClientOptions *opts, const char *name,
> NetClientState *peer);
> diff --git a/net/l2tpv3.c b/net/l2tpv3.c
> new file mode 100644
> index 0000000..4439ab7
> --- /dev/null
> +++ b/net/l2tpv3.c
> @@ -0,0 +1,745 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2012-2014 Cisco Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> copy
> + * of this software and associated documentation files (the "Software"), to
> deal
> + * in the Software without restriction, including without limitation the
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include <linux/ip.h>
> +#include <netdb.h>
> +#include "config-host.h"
> +#include "net/net.h"
> +#include "clients.h"
> +#include "monitor/monitor.h"
> +#include "qemu-common.h"
> +#include "qemu/error-report.h"
> +#include "qemu/option.h"
> +#include "qemu/sockets.h"
> +#include "qemu/iov.h"
> +#include "qemu/main-loop.h"
> +
> +
> +/* The buffer size needs to be investigated for optimum numbers and
> + * optimum means of paging in on different systems. This size is
> + * chosen to be sufficient to accommodate one packet with some headers
> + */
> +
> +#define BUFFER_ALIGN sysconf(_SC_PAGESIZE)
> +#define BUFFER_SIZE 2048
> +#define IOVSIZE 2
> +#define MAX_L2TPV3_MSGCNT 64
> +#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
> +
> +/* Header set to 0x30000 signifies a data packet */
> +
> +#define L2TPV3_DATA_PACKET 0x30000
> +
> +/* IANA-assigned IP protocol ID for L2TPv3 */
> +
> +#ifndef IPPROTO_L2TP
> +#define IPPROTO_L2TP 0x73
> +#endif
> +
> +typedef struct NetL2TPV3State {
> + NetClientState nc;
> + int fd;
> +
> + /*
> + * these are used for xmit - that happens packet a time
> + * and for first sign of life packet (easier to parse that once)
> + */
> +
> + uint8_t *header_buf;
> + struct iovec *vec;
> +
> + /*
> + * these are used for receive - try to "eat" up to 32 packets at a time
> + */
> +
> + struct mmsghdr *msgvec;
> +
> + /*
> + * peer address
> + */
> +
> + struct sockaddr_storage *dgram_dst;
> + uint32_t dst_size;
> +
> + /*
> + * L2TPv3 parameters
> + */
> +
> + uint64_t rx_cookie;
> + uint64_t tx_cookie;
> + uint32_t rx_session;
> + uint32_t tx_session;
> + uint32_t header_size;
> + uint32_t counter;
> +
> + /*
> + * DOS avoidance in error handling
> + */
> +
> + bool header_mismatch;
> +
> + /*
> + * Ring buffer handling
> + */
> +
> + int queue_head;
> + int queue_tail;
> + int queue_depth;
> +
> + /*
> + * Precomputed offsets
> + */
> +
> + uint32_t offset;
> + uint32_t cookie_offset;
> + uint32_t counter_offset;
> + uint32_t session_offset;
> +
> + /* Poll Control */
> +
> + bool read_poll;
> + bool write_poll;
> +
> + /* Flags */
> +
> + bool ipv6;
> + bool udp;
> + bool has_counter;
> + bool pin_counter;
> + bool cookie;
> + bool cookie_is_64;
> +
> +} NetL2TPV3State;
> +
> +static int l2tpv3_can_send(void *opaque);
> +static void net_l2tpv3_send(void *opaque);
> +static void l2tpv3_writable(void *opaque);
> +
> +static void l2tpv3_update_fd_handler(NetL2TPV3State *s)
> +{
> + qemu_set_fd_handler2(s->fd,
> + s->read_poll ? l2tpv3_can_send : NULL,
> + s->read_poll ? net_l2tpv3_send : NULL,
> + s->write_poll ? l2tpv3_writable : NULL,
> + s);
> +}
> +
> +static void l2tpv3_read_poll(NetL2TPV3State *s, bool enable)
> +{
> + if (s->read_poll != enable) {
> + s->read_poll = enable;
> + l2tpv3_update_fd_handler(s);
> + }
> +}
> +
> +static void l2tpv3_write_poll(NetL2TPV3State *s, bool enable)
> +{
> + if (s->write_poll != enable) {
> + s->write_poll = enable;
> + l2tpv3_update_fd_handler(s);
> + }
> +}
> +
> +static void l2tpv3_writable(void *opaque)
> +{
> + NetL2TPV3State *s = opaque;
> + l2tpv3_write_poll(s, false);
> + qemu_flush_queued_packets(&s->nc);
> +}
> +
> +static int l2tpv3_can_send(void *opaque)
> +{
> + NetL2TPV3State *s = opaque;
> +
> + return qemu_can_send_packet(&s->nc);
> +}
> +
> +static void l2tpv3_send_completed(NetClientState *nc, ssize_t len)
> +{
> + NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> + l2tpv3_read_poll(s, true);
> +}
> +
> +static void l2tpv3_poll(NetClientState *nc, bool enable)
> +{
> + NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> + l2tpv3_write_poll(s, enable);
> + l2tpv3_read_poll(s, enable);
> +}
> +
> +static void l2tpv3_form_header(NetL2TPV3State *s)
> +{
> + uint32_t *counter;
> +
> + if (s->udp) {
> + stl_be_p((uint32_t *) s->header_buf, L2TPV3_DATA_PACKET);
> + }
> + stl_be_p(
> + (uint32_t *) (s->header_buf + s->session_offset),
> + s->tx_session
> + );
> + if (s->cookie) {
> + if (s->cookie_is_64) {
> + stq_be_p(
> + (uint64_t *)(s->header_buf + s->cookie_offset),
> + s->tx_cookie
> + );
> + } else {
> + stl_be_p(
> + (uint32_t *) (s->header_buf + s->cookie_offset),
> + s->tx_cookie
> + );
> + }
> + }
> + if (s->has_counter) {
> + counter = (uint32_t *)(s->header_buf + s->counter_offset);
> + if (s->pin_counter) {
> + *counter = 0;
> + } else {
> + stl_be_p(counter, ++s->counter);
> + }
> + }
> +}
> +
> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc,
> + const struct iovec *iov,
> + int iovcnt)
> +{
> + NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> + struct msghdr message;
> + int ret;
> +
> + if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
> + error_report(
> + "iovec too long %d > %d, change l2tpv3.h",
> + iovcnt, MAX_L2TPV3_IOVCNT
> + );
> + return -1;
> + }
> + l2tpv3_form_header(s);
> + memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec));
> + s->vec->iov_base = s->header_buf;
> + s->vec->iov_len = s->offset;
> + message.msg_name = s->dgram_dst;
> + message.msg_namelen = s->dst_size;
> + message.msg_iov = s->vec;
> + message.msg_iovlen = iovcnt + 1;
> + message.msg_control = NULL;
> + message.msg_controllen = 0;
> + message.msg_flags = 0;
> + do {
> + ret = sendmsg(s->fd, &message, 0);
> + } while ((ret == -1) && (errno == EINTR));
> + if (ret > 0) {
> + ret -= s->offset;
> + } else if (ret == 0) {
> + /* belt and braces - should not occur on DGRAM
> + * we should get an error and never a 0 send
> + */
> + ret = iov_size(iov, iovcnt);
> + } else {
> + /* signal upper layer that socket buffer is full */
> + ret = -errno;
> + if (ret == -EAGAIN || ret == -ENOBUFS) {
> + l2tpv3_write_poll(s, true);
> + ret = 0;
> + }
> + }
> + return ret;
> +}
> +
> +static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc,
> + const uint8_t *buf,
> + size_t size)
> +{
> + NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> + struct iovec *vec;
> + struct msghdr message;
> + ssize_t ret = 0;
> +
> + l2tpv3_form_header(s);
> + vec = s->vec;
> + vec->iov_base = s->header_buf;
> + vec->iov_len = s->offset;
> + vec++;
> + vec->iov_base = (void *) buf;
> + vec->iov_len = size;
> + message.msg_name = s->dgram_dst;
> + message.msg_namelen = s->dst_size;
> + message.msg_iov = s->vec;
> + message.msg_iovlen = 2;
> + message.msg_control = NULL;
> + message.msg_controllen = 0;
> + message.msg_flags = 0;
> + do {
> + ret = sendmsg(s->fd, &message, 0);
> + } while ((ret == -1) && (errno == EINTR));
> + if (ret > 0) {
> + ret -= s->offset;
> + } else if (ret == 0) {
> + /* belt and braces - should not occur on DGRAM
> + * we should get an error and never a 0 send
> + */
> + ret = size;
> + } else {
> + ret = -errno;
> + if (ret == -EAGAIN || ret == -ENOBUFS) {
> + /* signal upper layer that socket buffer is full */
> + l2tpv3_write_poll(s, true);
> + ret = 0;
> + }
> + }
> + return ret;
> +}
> +
> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf)
> +{
> +
> + uint32_t *session;
> + uint64_t cookie;
> +
> + if ((!s->udp) && (!s->ipv6)) {
> + buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
> + }
> +
> + /* we do not do a strict check for "data" packets as per
> + * the RFC spec because the pure IP spec does not have
> + * that anyway.
> + */
> +
> + if (s->cookie) {
> + if (s->cookie_is_64) {
> + cookie = ldq_be_p(buf + s->cookie_offset);
> + } else {
> + cookie = ldl_be_p(buf + s->cookie_offset);
> + }
> + if (cookie != s->rx_cookie) {
> + if (!s->header_mismatch) {
> + error_report("unknown cookie id");
> + }
> + return -1;
> + }
> + }
> + session = (uint32_t *) (buf + s->session_offset);
> + if (ldl_be_p(session) != s->rx_session) {
> + if (!s->header_mismatch) {
> + error_report("session mismatch");
> + }
> + return -1;
> + }
> + return 0;
> +}
> +
> +static void net_l2tpv3_process_queue(NetL2TPV3State *s)
> +{
> + int size = 0;
> + struct iovec *vec;
> + bool bad_read;
> + int data_size;
> + struct mmsghdr *msgvec;
> +
> + /* go into ring mode only if there is a "pending" tail */
> + if (s->queue_depth > 0) {
> + do {
> + msgvec = s->msgvec + s->queue_tail;
> + if (msgvec->msg_len > 0) {
> + data_size = msgvec->msg_len - s->header_size;
> + vec = msgvec->msg_hdr.msg_iov;
> + if ((data_size > 0) &&
> + (l2tpv3_verify_header(s, vec->iov_base) == 0)) {
> + vec++;
> + /* Use the legacy delivery for now, we will
> + * switch to using our own ring as a queueing mechanism
> + * at a later date
> + */
> + size = qemu_send_packet_async(
> + &s->nc,
> + vec->iov_base,
> + data_size,
> + l2tpv3_send_completed
> + );
> + bad_read = false;
> + } else {
> + bad_read = true;
> + if (!s->header_mismatch) {
> + /* report error only once */
> + error_report("l2tpv3 header verification failed");
> + s->header_mismatch = true;
> + }
> + }
> + } else {
> + bad_read = true;
> + }
> + if ((bad_read) || (size > 0)) {
> + s->queue_tail = (s->queue_tail + 1) % MAX_L2TPV3_MSGCNT;
> + s->queue_depth--;
> + }
> + } while (
> + (s->queue_depth > 0) &&
> + qemu_can_send_packet(&s->nc) &&
> + ((size > 0) || bad_read)
> + );
> + }
> +}
> +
> +static void net_l2tpv3_send(void *opaque)
> +{
> + NetL2TPV3State *s = opaque;
> + int target_count, count;
> + struct mmsghdr *msgvec;
> +
> + /* go into ring mode only if there is a "pending" tail */
> +
> + if (s->queue_depth) {
> +
> + /* The ring buffer we use has variable intake
> + * count of how much we can read varies - adjust accordingly
> + */
> +
> + target_count = MAX_L2TPV3_MSGCNT - s->queue_depth;
> +
> + /* Ensure we do not overrun the ring when we have
> + * a lot of enqueued packets
> + */
> +
> + if (s->queue_head + target_count > MAX_L2TPV3_MSGCNT) {
> + target_count = MAX_L2TPV3_MSGCNT - s->queue_head;
> + }
> + } else {
> +
> + /* we do not have any pending packets - we can use
> + * the whole message vector linearly instead of using
> + * it as a ring
> + */
> +
> + s->queue_head = 0;
> + s->queue_tail = 0;
> + target_count = MAX_L2TPV3_MSGCNT;
> + }
> +
> + msgvec = s->msgvec + s->queue_head;
> + if (target_count > 0) {
> + do {
> + count = recvmmsg(
> + s->fd,
> + msgvec,
> + target_count, MSG_DONTWAIT, NULL);
> + } while ((count == -1) && (errno == EINTR));
> + if (count < 0) {
> + /* Recv error - we still need to flush packets here,
> + * (re)set queue head to current position
> + */
> + count = 0;
> + }
> + s->queue_head = (s->queue_head + count) % MAX_L2TPV3_MSGCNT;
> + s->queue_depth += count;
> + }
> + net_l2tpv3_process_queue(s);
> +}
> +
> +static void destroy_vector(struct mmsghdr *msgvec, int count, int iovcount)
> +{
> + int i, j;
> + struct iovec *iov;
> + struct mmsghdr *cleanup = msgvec;
> + if (cleanup) {
> + for (i = 0; i < count; i++) {
> + if (cleanup->msg_hdr.msg_iov) {
> + iov = cleanup->msg_hdr.msg_iov;
> + for (j = 0; j < iovcount; j++) {
> + g_free(iov->iov_base);
> + iov++;
> + }
> + g_free(cleanup->msg_hdr.msg_iov);
> + }
> + cleanup++;
> + }
> + g_free(msgvec);
> + }
> +}
> +
> +static struct mmsghdr *build_l2tpv3_vector(NetL2TPV3State *s, int count)
> +{
> + int i;
> + struct iovec *iov;
> + struct mmsghdr *msgvec, *result;
> +
> + msgvec = g_malloc(sizeof(struct mmsghdr) * count);
> + result = msgvec;
> + for (i = 0; i < count ; i++) {
> + msgvec->msg_hdr.msg_name = NULL;
> + msgvec->msg_hdr.msg_namelen = 0;
> + iov = g_malloc(sizeof(struct iovec) * IOVSIZE);
> + msgvec->msg_hdr.msg_iov = iov;
> + iov->iov_base = g_malloc(s->header_size);
> + iov->iov_len = s->header_size;
> + iov++ ;
> + iov->iov_base = qemu_memalign(BUFFER_ALIGN, BUFFER_SIZE);
> + iov->iov_len = BUFFER_SIZE;
> + msgvec->msg_hdr.msg_iovlen = 2;
> + msgvec->msg_hdr.msg_control = NULL;
> + msgvec->msg_hdr.msg_controllen = 0;
> + msgvec->msg_hdr.msg_flags = 0;
> + msgvec++;
> + }
> + return result;
> +}
> +
> +static void net_l2tpv3_cleanup(NetClientState *nc)
> +{
> + NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> + qemu_purge_queued_packets(nc);
> + l2tpv3_read_poll(s, false);
> + l2tpv3_write_poll(s, false);
> + close(s->fd);
> + destroy_vector(s->msgvec, MAX_L2TPV3_MSGCNT, IOVSIZE);
> + g_free(s->header_buf);
> + g_free(s->dgram_dst);
> +}
> +
> +static NetClientInfo net_l2tpv3_info = {
> + .type = NET_CLIENT_OPTIONS_KIND_L2TPV3,
> + .size = sizeof(NetL2TPV3State),
> + .receive = net_l2tpv3_receive_dgram,
> + .receive_iov = net_l2tpv3_receive_dgram_iov,
> + .poll = l2tpv3_poll,
> + .cleanup = net_l2tpv3_cleanup,
> +};
> +
> +int net_init_l2tpv3(const NetClientOptions *opts,
> + const char *name,
> + NetClientState *peer)
> +{
> +
> +
> + const NetdevL2TPv3Options *l2tpv3;
> + NetL2TPV3State *s;
> + NetClientState *nc;
> + int fd = -1, gairet;
> + struct addrinfo hints;
> + struct addrinfo *result = NULL;
> + char *srcport, *dstport;
> +
> + nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
> +
> + s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> + s->queue_head = 0;
> + s->queue_tail = 0;
> + s->header_mismatch = false;
> +
> + assert(opts->kind == NET_CLIENT_OPTIONS_KIND_L2TPV3);
> + l2tpv3 = opts->l2tpv3;
> +
> + if (l2tpv3->has_ipv6 && l2tpv3->ipv6) {
> + s->ipv6 = l2tpv3->ipv6;
> + } else {
> + s->ipv6 = false;
> + }
> +
> + if (l2tpv3->has_rxcookie || l2tpv3->has_txcookie) {
> + if (l2tpv3->has_rxcookie && l2tpv3->has_txcookie) {
> + s->cookie = true;
> + } else {
> + goto outerr;
> + }
> + } else {
> + s->cookie = false;
> + }
> +
> + if (l2tpv3->has_cookie64 || l2tpv3->cookie64) {
> + s->cookie_is_64 = true;
> + } else {
> + s->cookie_is_64 = false;
> + }
> +
> + if (l2tpv3->has_udp && l2tpv3->udp) {
> + s->udp = true;
> + if (!(l2tpv3->has_srcport && l2tpv3->has_dstport)) {
> + error_report("l2tpv3_open : need both src and dst port for udp");
> + goto outerr;
> + } else {
> + srcport = l2tpv3->srcport;
> + dstport = l2tpv3->dstport;
> + }
> + } else {
> + s->udp = false;
> + srcport = NULL;
> + dstport = NULL;
> + }
> +
> +
> + s->offset = 4;
> + s->session_offset = 0;
> + s->cookie_offset = 4;
> + s->counter_offset = 4;
> +
> + s->tx_session = l2tpv3->txsession;
> + if (l2tpv3->has_rxsession) {
> + s->rx_session = l2tpv3->rxsession;
> + } else {
> + s->rx_session = s->tx_session;
> + }
> +
> + if (s->cookie) {
> + s->rx_cookie = l2tpv3->rxcookie;
> + s->tx_cookie = l2tpv3->txcookie;
> + if (s->cookie_is_64 == true) {
> + /* 64 bit cookie */
> + s->offset += 8;
> + s->counter_offset += 8;
> + } else {
> + /* 32 bit cookie */
> + s->offset += 4;
> + s->counter_offset += 4;
> + }
> + }
> +
> + memset(&hints, 0, sizeof(hints));
> +
> + if (s->ipv6) {
> + hints.ai_family = AF_INET6;
> + } else {
> + hints.ai_family = AF_INET;
> + }
> + if (s->udp) {
> + hints.ai_socktype = SOCK_DGRAM;
> + hints.ai_protocol = 0;
> + s->offset += 4;
> + s->counter_offset += 4;
> + s->session_offset += 4;
> + s->cookie_offset += 4;
> + } else {
> + hints.ai_socktype = SOCK_RAW;
> + hints.ai_protocol = IPPROTO_L2TP;
> + }
> +
> + gairet = getaddrinfo(l2tpv3->src, srcport, &hints, &result);
> +
> + if ((gairet != 0) || (result == NULL)) {
> + error_report(
> + "l2tpv3_open : could not resolve src, errno = %s",
> + gai_strerror(gairet)
> + );
> + goto outerr;
> + }
> + fd = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
> + if (fd == -1) {
> + fd = -errno;
> + error_report("l2tpv3_open : socket creation failed, errno = %d",
> -fd);
> + freeaddrinfo(result);
> + goto outerr;
> + }
> + if (bind(fd, (struct sockaddr *) result->ai_addr, result->ai_addrlen)) {
> + error_report("l2tpv3_open : could not bind socket err=%i", errno);
> + goto outerr;
> + }
> +
> + freeaddrinfo(result);
> +
> + memset(&hints, 0, sizeof(hints));
> +
> + if (s->ipv6) {
> + hints.ai_family = AF_INET6;
> + } else {
> + hints.ai_family = AF_INET;
> + }
> + if (s->udp) {
> + hints.ai_socktype = SOCK_DGRAM;
> + hints.ai_protocol = 0;
> + } else {
> + hints.ai_socktype = SOCK_RAW;
> + hints.ai_protocol = IPPROTO_L2TP;
> + }
> +
> + gairet = getaddrinfo(l2tpv3->dst, dstport, &hints, &result);
> + if ((gairet != 0) || (result == NULL)) {
> + error_report(
> + "l2tpv3_open : could not resolve dst, error = %s",
> + gai_strerror(gairet)
> + );
> + goto outerr;
> + }
> +
> + s->dgram_dst = g_malloc(sizeof(struct sockaddr_storage));
> + memset(s->dgram_dst, '\0' , sizeof(struct sockaddr_storage));
> + memcpy(s->dgram_dst, result->ai_addr, result->ai_addrlen);
> + s->dst_size = result->ai_addrlen;
> +
> + freeaddrinfo(result);
> +
> + if (l2tpv3->has_counter && l2tpv3->counter) {
> + s->has_counter = true;
> + s->offset += 4;
> + } else {
> + s->has_counter = false;
> + }
> +
> + if (l2tpv3->has_pincounter && l2tpv3->pincounter) {
> + s->has_counter = true; /* pin counter implies that there is counter
> */
> + s->pin_counter = true;
> + } else {
> + s->pin_counter = false;
> + }
> +
> + if (l2tpv3->has_offset) {
> + /* extra offset */
> + s->offset += l2tpv3->offset;
> + }
> +
> + if ((s->ipv6) || (s->udp)) {
> + s->header_size = s->offset;
> + } else {
> + s->header_size = s->offset + sizeof(struct iphdr);
> + }
> +
> + s->msgvec = build_l2tpv3_vector(s, MAX_L2TPV3_MSGCNT);
> + s->vec = g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT);
> + s->header_buf = g_malloc(s->header_size);
> +
> + qemu_set_nonblock(fd);
> +
> + s->fd = fd;
> + s->counter = 0;
> +
> + l2tpv3_read_poll(s, true);
> +
> + if (!s) {
> + error_report("l2tpv3_open : failed to set fd handler");
> + goto outerr;
> + }
> + snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> + "l2tpv3: connected");
> + return 0;
> +outerr:
> + qemu_del_net_client(nc);
> + if (fd > 0) {
> + close(fd);
> + }
> + return -1;
> +}
> +
> diff --git a/net/net.c b/net/net.c
> index 0a88e68..749d34c 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -731,6 +731,9 @@ static int (* const
> net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
> [NET_CLIENT_OPTIONS_KIND_BRIDGE] = net_init_bridge,
> #endif
> [NET_CLIENT_OPTIONS_KIND_HUBPORT] = net_init_hubport,
> +#ifdef CONFIG_LINUX
> + [NET_CLIENT_OPTIONS_KIND_L2TPV3] = net_init_l2tpv3,
> +#endif
> };
>
>
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 83fa485..aefc478 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2941,6 +2941,62 @@
> '*udp': 'str' } }
>
> ##
> +# @NetdevL2TPv3Options
> +#
> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
> +#
> +# @src: source address
> +#
> +# @dst: destination address
> +#
> +# @srcport: #optional source port - mandatory for udp, optional for ip
> +#
> +# @dstport: #optional destination port - mandatory for udp, optional for ip
> +#
> +# @ipv6: #optional - force the use of ipv6
> +#
> +# @udp: #optional - use the udp version of l2tpv3 encapsulation
> +#
> +# @cookie64: #optional - use 64 bit coookies
> +#
> +# @counter: #optional have sequence counter
> +#
> +# @pincounter: #optional pin sequence counter to zero -
> +# workaround for buggy implementations or
> +# networks with packet reorder
> +#
> +# @txcookie: #optional 32 or 64 bit transmit cookie
> +#
> +# @rxcookie: #optional 32 or 64 bit receive cookie
> +#
> +# @txsession: 32 bit transmit session
> +#
> +# @rxsession: #optional 32 bit receive session - if not specified
> +# set to the same value as transmit
> +#
> +# @offset: #optional additional offset - allows the insertion of
> +# additional application-specific data before the packet payload
> +#
> +# Since 2.1
> +##
> +{ 'type': 'NetdevL2TPv3Options',
> + 'data': {
> + 'src': 'str',
> + 'dst': 'str',
> + '*srcport': 'str',
> + '*dstport': 'str',
> + '*ipv6': 'bool',
> + '*udp': 'bool',
> + '*cookie64': 'bool',
> + '*counter': 'bool',
> + '*pincounter': 'bool',
> + '*txcookie': 'uint64',
> + '*rxcookie': 'uint64',
> + 'txsession': 'uint32',
> + '*rxsession': 'uint32',
> + '*offset': 'uint32' } }
> +
> +##
> # @NetdevVdeOptions
> #
> # Connect the VLAN to a vde switch running on the host.
> @@ -3014,6 +3070,9 @@
> # A discriminated record of network device traits.
> #
> # Since 1.2
> +#
> +# 'l2tpv3' - since 2.1
> +#
> ##
> { 'union': 'NetClientOptions',
> 'data': {
> @@ -3021,6 +3080,7 @@
> 'nic': 'NetLegacyNicOptions',
> 'user': 'NetdevUserOptions',
> 'tap': 'NetdevTapOptions',
> + 'l2tpv3': 'NetdevL2TPv3Options',
> 'socket': 'NetdevSocketOptions',
> 'vde': 'NetdevVdeOptions',
> 'dump': 'NetdevDumpOptions',
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8b94264..e1caf6f 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1395,6 +1395,29 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
> " (default=" DEFAULT_BRIDGE_INTERFACE ") using the
> program 'helper'\n"
> " (default=" DEFAULT_BRIDGE_HELPER ")\n"
> #endif
> +#ifdef __linux__
> + "-net
> l2tpv3[,vlan=n][,name=str],src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport],txsession=txsession[,rxsession=rxsession][,ipv6=on/off][,udp=on/off][,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie][,rxcookie=rxcookie][,offset=offset]\n"
> + " connect the VLAN to an Ethernet over L2TPv3
> pseudowire\n"
> + " Linux kernel 3.3+ as well as most routers can talk\n"
> + " L2TPv3. This transport allows to connect a VM to a
> VM,\n"
> + " VM to a router and even VM to Host. It is a
> nearly-universal\n"
> + " standard (RFC3391). Note - this implementation uses
> static\n"
> + " pre-configured tunnels (same as the Linux kernel).\n"
> + " use 'src=' to specify source address\n"
> + " use 'dst=' to specify destination address\n"
> + " use 'udp=on' to specify udp encapsulation\n"
> + " use 'dstport=' to specify destination udp port\n"
> + " use 'dstport=' to specify destination udp port\n"
> + " use 'ipv6=on' to force v6\n"
> + " L2TPv3 uses cookies to prevent misconfiguration as\n"
> + " well as a weak security measure\n"
> + " use 'rxcookie=0x012345678' to specify a rxcookie\n"
> + " use 'txcookie=0x012345678' to specify a txcookie\n"
> + " use 'cookie64=on' to set cookie size to 64 bit,
> otherwise 32\n"
> + " use 'counter=off' to force a 'cut-down' L2TPv3 with no
> counter\n"
> + " use 'pincounter=on' to work around broken counter
> handling in peer\n"
> + " use 'offset=X' to add an extra offset between header
> and data\n"
> +#endif
> "-net
> socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n"
> " connect the vlan 'n' to another VLAN using a socket
> connection\n"
> "-net
> socket[,vlan=n][,name=str][,fd=h][,mcast=maddr:port[,localaddr=addr]]\n"
> @@ -1730,6 +1753,65 @@ qemu-system-i386 linux.img \
> -net socket,mcast=239.192.168.1:1102,localaddr=1.2.3.4
> @end example
>
> address@hidden -netdev
> l2tpv3,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden,ipv6][,udp][,cookie64][,counter][,pincounter][,address@hidden,address@hidden,address@hidden
> address@hidden -net
> l2tpv3[,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden,ipv6][,udp][,cookie64][,counter][,pincounter][,address@hidden,address@hidden,address@hidden
> +Connect VLAN @var{n} to L2TPv3 pseudowire. L2TPv3 (RFC3391) is a popular
> +protocol to transport Ethernet (and other Layer 2) data frames between
> +two systems. It is present in routers, firewalls and the Linux kernel
> +(from version 3.3 onwards).
> +
> +This transport allows a VM to communicate to another VM, router or firewall
> directly.
> +
> address@hidden address@hidden
> + source address (mandatory)
> address@hidden address@hidden
> + destination address (mandatory)
> address@hidden udp
> + select udp encapsulation (default is ip).
> address@hidden address@hidden
> + source udp port.
> address@hidden address@hidden
> + destination udp port.
> address@hidden ipv6
> + force v6, otherwise defaults to v4.
> address@hidden address@hidden
> address@hidden address@hidden
> + Cookies are a weak form of security in the l2tpv3 specification.
> +Their function is mostly to prevent misconfiguration. By default they are 32
> +bit.
> address@hidden cookie64
> + Set cookie size to 64 bit instead of the default 32
> address@hidden counter=off
> + Force a 'cut-down' L2TPv3 with no counter as in
> +draft-mkonstan-l2tpext-keyed-ipv6-tunnel-00
> address@hidden pincounter=on
> + Work around broken counter handling in peer. This may also help on
> +networks which have packet reorder.
> address@hidden address@hidden
> + Add an extra offset between header and data
> +
> +For example, to attach a VM running on host 4.3.2.1 via L2TPv3 to the bridge
> br-lan
> +on the remote Linux host 1.2.3.4:
> address@hidden
> +# Setup tunnel on linux host using raw ip as encapsulation
> +# on 1.2.3.4
> +ip l2tp add tunnel remote 4.3.2.1 local 1.2.3.4 tunnel_id 1 peer_tunnel_id 1
> \
> + encap udp udp_sport 16384 udp_dport 16384
> +ip l2tp add session tunnel_id 1 name vmtunnel0 session_id \
> + 0xFFFFFFFF peer_session_id 0xFFFFFFFF
> +ifconfig vmtunnel0 mtu 1500
> +ifconfig vmtunnel0 up
> +brctl addif br-lan vmtunnel0
> +
> +
> +# on 4.3.2.1
> +# launch QEMU instance - if your network has reorder or is very lossy add
> ,pincounter
> +
> +qemu-system-i386 linux.img -net nic -net
> l2tpv3,src=4.2.3.1,dst=1.2.3.4,udp,srcport=16384,dstport=16384,rxsession=0xffffffff,txsession=0xffffffff,counter
> +
> +
> address@hidden example
> +
> @item -netdev
> vde,address@hidden,address@hidden,address@hidden,address@hidden,address@hidden
> @item -net vde[,address@hidden,address@hidden,address@hidden
> [,address@hidden,address@hidden,address@hidden
> Connect VLAN @var{n} to PORT @var{n} of a vde switch running on host and
> --
> 1.7.10.4
>
>