qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] hw/rdma: Add support for GID state changes for


From: Yuval Shaia
Subject: Re: [Qemu-devel] [PATCH] hw/rdma: Add support for GID state changes for non-qmp frameworks
Date: Sun, 26 May 2019 09:41:45 +0300
User-agent: Mutt/1.10.1 (2018-07-13)

On Fri, May 24, 2019 at 08:24:30AM +0300, Marcel Apfelbaum wrote:
> 
> Hi Yuval,
> 
> On 5/5/19 1:55 PM, Yuval Shaia wrote:
> > Any GID change in guest must be propogate to host. This is already done
> > by firing QMP event to managment system such as libvirt which in turn
> > will update the host with the relevant change.
> 
> Agreed, *any* management software can do that.
> 
> > 
> > When qemu is executed on non-qmp framework (ex from command-line) we
> > need to update the host instead.
> > Fix it by adding support to update the RoCE device's Ethernet function
> > IP list from qemu via netlink.
> 
> I am not sure this is the right approach. I don't think QEMU should actively
> change
> the host network configuration.
> As you pointed out yourself, the management software should make such
> changes.

I know about few deployments that are not using any management software,
they fires their VMs right from command-line.

Currently those deployments cannot use pvrdma.

> 
> I agree you cannot always assume the QEMU instance is managed by libvirt,
> what about adding this functionality to rdma-multiplexer? The multiplexer
> may
> register to the same QMP event.

Two reasons prevent us from doing this:
- rdmacm-mux is a specific MAD multiplexer for CM packets, lets do not add
  management function to it.
- rdmacm-mux might be redundant when MAD multiplexer will be implemented in
  kernel. So what then?

> 
> Even if you think the multiplexer is not the right way to do it, even a
> simple bash script
> disguised  as a systemd service can subscribe to the QMP event and make the
> change on the host.
> 
> What do you think?

Another contrib app? A lightweight management software??


See, i do not have an argument if qemu policy is not allowing qemu process
to do external configuration (ex network). I'm just looking from a narrow
perspective of easy deployment - people sometimes runs qemu without libvirt
(or any other management software for that matter), if they want to use
pvrdma they are forced to install libvirt just for that.

> 
> Thanks,
> Marcel
> 
> > 
> > Signed-off-by: Yuval Shaia <address@hidden>
> > ---
> >   configure              |  6 ++++
> >   hw/rdma/rdma_backend.c | 74 +++++++++++++++++++++++++++++++++++++++++-
> >   2 files changed, 79 insertions(+), 1 deletion(-)
> > 
> > diff --git a/configure b/configure
> > index 5b183c2e39..1f707b1a62 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3132,6 +3132,8 @@ fi
> >   cat > $TMPC <<EOF &&
> >   #include <sys/mman.h>
> > +#include <libmnl/libmnl.h>
> > +#include <linux/rtnetlink.h>
> >   int
> >   main(void)
> > @@ -3144,10 +3146,13 @@ main(void)
> >   }
> >   EOF
> > +pvrdma_libs="-lmnl"
> > +
> >   if test "$rdma" = "yes" ; then
> >       case "$pvrdma" in
> >       "")
> >           if compile_prog "" ""; then
> > +            libs_softmmu="$libs_softmmu $pvrdma_libs"
> >               pvrdma="yes"
> >           else
> >               pvrdma="no"
> > @@ -3156,6 +3161,7 @@ if test "$rdma" = "yes" ; then
> >       "yes")
> >           if ! compile_prog "" ""; then
> >               error_exit "PVRDMA is not supported since mremap is not 
> > implemented"
> > +                        " or libmnl-devel is not installed"
> >           fi
> >           pvrdma="yes"
> >           ;;
> > diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> > index 05f6b03221..bc57b1a624 100644
> > --- a/hw/rdma/rdma_backend.c
> > +++ b/hw/rdma/rdma_backend.c
> > @@ -16,6 +16,11 @@
> >   #include "qemu/osdep.h"
> >   #include "qapi/qapi-events-rdma.h"
> > +#include "linux/if_addr.h"
> > +#include "libmnl/libmnl.h"
> > +#include "linux/rtnetlink.h"
> > +#include "net/if.h"
> > +
> >   #include <infiniband/verbs.h>
> >   #include "contrib/rdmacm-mux/rdmacm-mux.h"
> > @@ -47,6 +52,61 @@ static void dummy_comp_handler(void *ctx, struct ibv_wc 
> > *wc)
> >       rdma_error_report("No completion handler is registered");
> >   }
> > +static int netlink_route_update(const char *ifname, union ibv_gid *gid,
> > +                                __u16 type)
> > +{
> > +    char buf[MNL_SOCKET_BUFFER_SIZE];
> > +    struct nlmsghdr *nlh;
> > +    struct ifaddrmsg *ifm;
> > +    struct mnl_socket *nl;
> > +    int ret;
> > +    uint32_t ipv4;
> > +
> > +    nl = mnl_socket_open(NETLINK_ROUTE);
> > +    if (!nl) {
> > +        rdma_error_report("Fail to connect to netlink\n");
> > +        return -EIO;
> > +    }
> > +
> > +    ret = mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID);
> > +    if (ret < 0) {
> > +        rdma_error_report("Fail to bind to netlink\n");
> > +        goto out;
> > +    }
> > +
> > +    nlh = mnl_nlmsg_put_header(buf);
> > +    nlh->nlmsg_type = type;
> > +    nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
> > +    nlh->nlmsg_seq = 1;
> > +
> > +    ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> > +    ifm->ifa_index = if_nametoindex(ifname);
> > +    if (gid->global.subnet_prefix) {
> > +        ifm->ifa_family = AF_INET6;
> > +        ifm->ifa_prefixlen = 64;
> > +        ifm->ifa_flags = IFA_F_PERMANENT;
> > +        ifm->ifa_scope = RT_SCOPE_UNIVERSE;
> > +        mnl_attr_put(nlh, IFA_ADDRESS, sizeof(*gid), gid);
> > +    } else {
> > +        ifm->ifa_family = AF_INET;
> > +        ifm->ifa_prefixlen = 24;
> > +        memcpy(&ipv4, (char *)&gid->global.interface_id + 4, sizeof(ipv4));
> > +        mnl_attr_put(nlh, IFA_LOCAL, 4, &ipv4);
> > +    }
> > +
> > +    ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
> > +    if (ret < 0) {
> > +        rdma_error_report("Fail to send msg to to netlink\n");
> > +        goto out;
> > +    }
> > +
> > +    ret = 0;
> > +
> > +out:
> > +    mnl_socket_close(nl);
> > +    return ret;
> > +}
> > +
> >   static inline void complete_work(enum ibv_wc_status status, uint32_t 
> > vendor_err,
> >                                    void *ctx)
> >   {
> > @@ -1123,7 +1183,13 @@ int rdma_backend_add_gid(RdmaBackendDev 
> > *backend_dev, const char *ifname,
> >                                               gid->global.subnet_prefix,
> >                                               gid->global.interface_id);
> > -    return ret;
> > +    /*
> > +     * We ignore return value since operation might completed sucessfully
> > +     * by the QMP consumer
> > +     */
> > +    netlink_route_update(ifname, gid, RTM_NEWADDR);
> > +
> > +    return 0;
> >   }
> >   int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
> > @@ -1149,6 +1215,12 @@ int rdma_backend_del_gid(RdmaBackendDev 
> > *backend_dev, const char *ifname,
> >                                               gid->global.subnet_prefix,
> >                                               gid->global.interface_id);
> > +    /*
> > +     * We ignore return value since operation might completed sucessfully
> > +     * by the QMP consumer
> > +     */
> > +    netlink_route_update(ifname, gid, RTM_DELADDR);
> > +
> >       return 0;
> >   }
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]