Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legac

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legac

From:	Jason Wang
Subject:	Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest
Date:	Fri, 26 Jun 2015 12:06:35 +0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0


On 06/25/2015 10:22 PM, Thibaut Collet wrote:
> On Thu, Jun 25, 2015 at 2:53 PM, Michael S. Tsirkin <address@hidden> wrote:
>> On Thu, Jun 25, 2015 at 01:01:29PM +0200, Thibaut Collet wrote:
>>> On Thu, Jun 25, 2015 at 11:59 AM, Jason Wang <address@hidden> wrote:
>>>>
>>>>
>>>> On 06/24/2015 07:05 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Jun 24, 2015 at 04:31:15PM +0800, Jason Wang wrote:
>>>>>> On 06/23/2015 01:49 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Jun 23, 2015 at 10:12:17AM +0800, Jason Wang wrote:
>>>>>>>>> On 06/18/2015 11:16 PM, Thibaut Collet wrote:
>>>>>>>>>>> On Tue, Jun 16, 2015 at 10:05 AM, Jason Wang <address@hidden> wrote:
>>>>>>>>>>>>> On 06/16/2015 03:24 PM, Thibaut Collet wrote:
>>>>>>>>>>>>>>> If my understanding is correct, on a resume operation, we have 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> following callback trace:
>>>>>>>>>>>>>>> 1. virtio_pci_restore function that calls all restore call back 
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> virtio devices
>>>>>>>>>>>>>>> 2. virtnet_restore that calls try_fill_recv function for each 
>>>>>>>>>>>>>>> virtual queues
>>>>>>>>>>>>>>> 3. try_fill_recv function kicks the virtual queue (through
>>>>>>>>>>>>>>> virtqueue_kick function)
>>>>>>>>>>>>> Yes, but this happens only after pm resume not migration. 
>>>>>>>>>>>>> Migration is
>>>>>>>>>>>>> totally transparent to guest.
>>>>>>>>>>>>>
>>>>>>>>>>> Hi Jason,
>>>>>>>>>>>
>>>>>>>>>>> After a deeper look in the migration code of QEMU a resume event is
>>>>>>>>>>> always sent when the live migration is finished.
>>>>>>>>>>> On a live migration we have the following callback trace:
>>>>>>>>>>> 1. The VM on the new host is set to the state RUN_STATE_INMIGRATE, 
>>>>>>>>>>> the
>>>>>>>>>>> autostart boolean to 1  and calls the qemu_start_incoming_migration
>>>>>>>>>>> function (see function main of vl.c)
>>>>>>>>>>> .....
>>>>>>>>>>> 2. call of process_incoming_migration function in
>>>>>>>>>>> migration/migration.c file whatever the way to do the live migration
>>>>>>>>>>> (tcp:, fd:, unix:, exec: ...)
>>>>>>>>>>> 3. call of process_incoming_migration_co function in 
>>>>>>>>>>> migration/migration.c
>>>>>>>>>>> 4. call of vm_start function in vl.c (otherwise the migrated VM stay
>>>>>>>>>>> in the pause state, the autostart boolean is set to 1 by the main
>>>>>>>>>>> function in vl.c)
>>>>>>>>>>> 5. call of vm_start function that sets the VM is the 
>>>>>>>>>>> RUN_STATE_RUNNING state.
>>>>>>>>>>> 6. call of qapi_event_send_resume function that ends a resume event 
>>>>>>>>>>> to the VM
>>>>>>>>> AFAIK, this function sends resume event to qemu monitor not VM.
>>>>>>>>>
>>>>>>>>>>> So when a live migration is ended:
>>>>>>>>>>> 1. a resume event is sent to the guest
>>>>>>>>>>> 2. On the reception of this resume event the virtual queue are 
>>>>>>>>>>> kicked
>>>>>>>>>>> by the guest
>>>>>>>>>>> 3. Backend vhost user catches this kick and can emit a RARP to guest
>>>>>>>>>>> that does not support GUEST_ANNOUNCE
>>>>>>>>>>>
>>>>>>>>>>> This solution, as solution based on detection of DRIVER_OK status
>>>>>>>>>>> suggested by Michael, allows backend to send the RARP to legacy 
>>>>>>>>>>> guest
>>>>>>>>>>> without involving QEMU and add ioctl to vhost-user.
>>>>>>>>> A question here is did vhost-user code pass status to the backend? If
>>>>>>>>> not, how can userspace backend detect DRIVER_OK?
>>>>>>> Sorry, I must have been unclear.
>>>>>>> vhost core calls VHOST_NET_SET_BACKEND on DRIVER_OK.
>>>>>>> Unfortunately vhost user currently translates it to VHOST_USER_NONE.
>>>>>> Looks like VHOST_NET_SET_BACKEND was only used for tap backend.
>>>>>>
>>>>>>> As a work around, I think kicking ioeventfds once you get
>>>>>>> VHOST_NET_SET_BACKEND will work.
>>>>>> Maybe just a eventfd_set() in vhost_net_start(). But is this
>>>>>> "workaround" elegant enough to be documented? Is it better to do this
>>>>>> explicitly with a new feature?
>>>>> If you are going to do this anyway, there are a couple of other changes
>>>>> we should do, in particular, decide what we want to do with control vq.
>>>>>
>>>> If I understand correctly, you mean VIRTIO_NET_CTRL_MQ and
>>>> VIRTIO_NET_CTRL_GUEST_OFFLOADS? Looks like both of these were broken.
>>>> Need more thought, maybe new kinds of requests.
>>>>
>>>>
>>> Are there any objections to add VHOST_NET_SET_BACKEND support to vhost
>>> user with a patch like that:
>>>
>>>
>>>  hw/net/vhost_net.c     |    8 ++++++++
>>>  hw/virtio/vhost-user.c |   10 +++++++++-
>>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>> index 907e002..7a008c0 100644
>>> --- a/hw/net/vhost_net.c
>>> +++ b/hw/net/vhost_net.c
>>> @@ -234,6 +234,14 @@ static int vhost_net_start_one(struct vhost_net *net,
>>>                  goto fail;
>>>              }
>>>          }
>>> +    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
>>> +         file.fd = 0;
>>> +         for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
>>> +            const VhostOps *vhost_ops = net->dev.vhost_ops;
>>> +            int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND,
>>> +                                          &file);
>>> +            assert(r >= 0);
>>> +        }
>>>      }
>>>      return 0;
>>>  fail:
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index d6f2163..32c6bd9 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -41,6 +41,7 @@ typedef enum VhostUserRequest {
>>>      VHOST_USER_SET_VRING_KICK = 12,
>>>      VHOST_USER_SET_VRING_CALL = 13,
>>>      VHOST_USER_SET_VRING_ERR = 14,
>>> +    VHOST_USER_NET_SET_BACKEND = 15,
>>>      VHOST_USER_MAX
>>>  } VhostUserRequest;
>>>
>>> @@ -104,7 +105,8 @@ static unsigned long int
>>> ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
>>>      VHOST_GET_VRING_BASE,   /* VHOST_USER_GET_VRING_BASE */
>>>      VHOST_SET_VRING_KICK,   /* VHOST_USER_SET_VRING_KICK */
>>>      VHOST_SET_VRING_CALL,   /* VHOST_USER_SET_VRING_CALL */
>>> -    VHOST_SET_VRING_ERR     /* VHOST_USER_SET_VRING_ERR */
>>> +    VHOST_SET_VRING_ERR,    /* VHOST_USER_SET_VRING_ERR */
>>> +    VHOST_NET_SET_BACKEND   /* VHOST_USER_NET_SET_BACKEND */
>>>  };
>>>
>>>  static VhostUserRequest vhost_user_request_translate(unsigned long int 
>>> request)
>>> @@ -287,6 +289,12 @@ static int vhost_user_call(struct vhost_dev *dev,
>>> unsigned long int request,
>>>              msg.u64 |= VHOST_USER_VRING_NOFD_MASK;
>>>          }
>>>          break;
>>> +
>>> +    case VHOST_NET_SET_BACKEND:
>>> +        memcpy(&msg.file, arg, sizeof(struct vhost_vring_state));
>>> +        msg.size = sizeof(m.state);
>>> +        break;
>>> +
>>>      default:
>>>          error_report("vhost-user trying to send unhandled ioctl");
>>>          return -1;
>>>
>>>
>>> This message will be sent when guest is ready and can be used by vhost
>>> user backend to send RARP to legacy guest.
>>>
>>> This solution avoids to add new message and has no impact on control vq.
>>
>> I think that you can't add messages to protocol unconditionally.
>> For example, snabbswitch simply crashes if it gets an unknown
>> message.
>>
>> Either this needs a new feature bit, or implement
>> [PATCH RFC] vhost-user: protocol extensions
>> making it safe to add new messages.
>>
>> --
>> MST
> I understand.
> Last idea before doing a RFC:
> Normally guest notifies vhost of new buffer onto a virtqueue by
> kicking the eventfd. This eventfd has been provided to vhost by QEMU.
> So when DRIVER_OK is received by QEMU, QEMU can kick the eventfd.
>
> A possible patch to do that is:
>  hw/net/vhost_net.c |    7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 907e002..fbc55e0 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -234,6 +234,13 @@ static int vhost_net_start_one(struct vhost_net *net,
>                  goto fail;
>              }
>          }
> +    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
> +         int idx;
> +
> +         for (idx = 0; idx < net->dev.nvqs; ++idx) {
> +            struct VirtQueue *vq = virtio_get_queue(dev, idx);
> +            event_notifier_set(virtio_queue_get_host_notifier(vq));
> +        }
>      }
>      return 0;
>  fail:
>
> kicking this eventfd has no impact for QEMU or the guest (they do not
> poll it) and simply wake up vhost to allow it to send RARP for legacy
> guest.
>
> Regards.
>
> Thibaut.

This may work but the issue is:

- I believe we should document this in the spec. But it looks more like
a workaround and use implicit method to notify control message which
often cause ulgy codes in both side. This is not elegant to be
documented in the spec.
- Consider you may have 100 queues, then kick will happen 100 times and
backend need handle such case.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest, (continued)

Prev by Date: [Qemu-devel] [PATCH v3 5/5] numa: API to lookup NUMA node by address
Next by Date: Re: [Qemu-devel] [PATCH v3 4/5] numa: Store boot memory address range in node_info
Previous by thread: Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest
Next by thread: Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest
Index(es):
- Date
- Thread