[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: t
From: |
Jike Song |
Subject: |
Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot |
Date: |
Wed, 19 Oct 2016 10:32:13 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 |
On 10/18/2016 10:59 PM, Alex Williamson wrote:
> On Tue, 18 Oct 2016 20:38:21 +0800
> Jike Song <address@hidden> wrote:
>> On 10/18/2016 12:02 AM, Alex Williamson wrote:
>>> On Fri, 14 Oct 2016 15:19:01 -0700
>>> Neo Jia <address@hidden> wrote:
>>>
>>>> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:
>>>>> On Fri, 14 Oct 2016 09:35:45 -0700
>>>>> Neo Jia <address@hidden> wrote:
>>>>>
>>>>>> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:
>>>>>>> On Fri, 14 Oct 2016 08:41:58 -0600
>>>>>>> Alex Williamson <address@hidden> wrote:
>>>>>>>
>>>>>>>> On Fri, 14 Oct 2016 18:37:45 +0800
>>>>>>>> Jike Song <address@hidden> wrote:
>>>>>>>>
>>>>>>>>> On 10/11/2016 05:47 PM, Paolo Bonzini wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/10/2016 11:21, Xiao Guangrong wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/10/2016 04:39, Xiao Guangrong wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote:
>>>>>>>>>>>>>>>> Hi Neo,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the
>>>>>>>>>>>>>>>> PPGTT,
>>>>>>>>>>>>>>>> while nVidia does.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Paolo and Xiaoguang,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am just wondering how device driver can register a notifier
>>>>>>>>>>>>>>> so he
>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>> notified for write-protected pages when writes are happening.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It can't yet, but the API is ready for that. kvm_vfio_set_group
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> currently where a struct kvm_device* and struct vfio_group*
>>>>>>>>>>>>>> touch.
>>>>>>>>>>>>>> Given
>>>>>>>>>>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be
>>>>>>>>>>>>>> passed to
>>>>>>>>>>>>>> kvm_page_track_register_notifier. So I guess you could add a
>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>> that passes the struct kvm_device* to the mdev device.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Xiaoguang and Guangrong, what were your plans? We discussed it
>>>>>>>>>>>>>> briefly
>>>>>>>>>>>>>> at KVM Forum but I don't remember the details.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that
>>>>>>>>>>>>> we can
>>>>>>>>>>>>> figure out the kvm instance based on the fd.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We got a new idea, how about search the kvm instance by
>>>>>>>>>>>>> mm_struct, it
>>>>>>>>>>>>> can work as KVMGT is running in the vcpu context and it is much
>>>>>>>>>>>>> more
>>>>>>>>>>>>> straightforward.
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps I didn't understand your suggestion, but the same
>>>>>>>>>>>> mm_struct can
>>>>>>>>>>>> have more than 1 struct kvm so I'm not sure that it can work.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> vcpu->pid is valid during vcpu running so that it can be used to
>>>>>>>>>>> figure
>>>>>>>>>>> out which kvm instance owns the vcpu whose pid is the one as current
>>>>>>>>>>> thread, i think it can work. :)
>>>>>>>>>>
>>>>>>>>>> No, don't do that. There's no reason for a thread to run a single
>>>>>>>>>> VCPU,
>>>>>>>>>> and if you can have multiple VCPUs you can also have multiple VCPUs
>>>>>>>>>> from
>>>>>>>>>> multiple VMs.
>>>>>>>>>>
>>>>>>>>>> Passing file descriptors around are the right way to connect
>>>>>>>>>> subsystems.
>>>>>>>>>
>>>>>>>>> [CC Alex, Kevin and Qemu-devel]
>>>>>>>>>
>>>>>>>>> Hi Paolo & Alex,
>>>>>>>>>
>>>>>>>>> IIUC, passing file descriptors means touching QEMU and the UAPI
>>>>>>>>> between
>>>>>>>>> QEMU and VFIO. Would you guys have a look at below draft patch? If
>>>>>>>>> it's
>>>>>>>>> on the correct direction, I'll send the split ones. Thanks!
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks,
>>>>>>>>> Jike
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>>>>>>>>> index bec694c..f715d37 100644
>>>>>>>>> --- a/hw/vfio/pci-quirks.c
>>>>>>>>> +++ b/hw/vfio/pci-quirks.c
>>>>>>>>> @@ -10,12 +10,14 @@
>>>>>>>>> * the COPYING file in the top-level directory.
>>>>>>>>> */
>>>>>>>>>
>>>>>>>>> +#include <sys/ioctl.h>
>>>>>>>>> #include "qemu/osdep.h"
>>>>>>>>> #include "qemu/error-report.h"
>>>>>>>>> #include "qemu/range.h"
>>>>>>>>> #include "qapi/error.h"
>>>>>>>>> #include "hw/nvram/fw_cfg.h"
>>>>>>>>> #include "pci.h"
>>>>>>>>> +#include "sysemu/kvm.h"
>>>>>>>>> #include "trace.h"
>>>>>>>>>
>>>>>>>>> /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot
>>>>>>>>> match hw */
>>>>>>>>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice
>>>>>>>>> *vdev)
>>>>>>>>> break;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> +
>>>>>>>>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
>>>>>>>>> +{
>>>>>>>>> + int vmfd;
>>>>>>>>> +
>>>>>>>>> + if (!kvm_enabled() || !vdev->kvmgt)
>>>>>>>>> + return;
>>>>>>>>> +
>>>>>>>>> + /* Tell the device what KVM it attached */
>>>>>>>>> + vmfd = kvm_get_vmfd(kvm_state);
>>>>>>>>> + ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
>>>>>>>>> +}
>>>>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>>>>> index a5a620a..8732552 100644
>>>>>>>>> --- a/hw/vfio/pci.c
>>>>>>>>> +++ b/hw/vfio/pci.c
>>>>>>>>> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
>>>>>>>>> return ret;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> + vfio_quirk_kvmgt(vdev);
>>>>>>>>> +
>>>>>>>>> /* Get a copy of config space */
>>>>>>>>> ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>>>>>>>>> MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>>>>>>>>> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
>>>>>>>>> DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
>>>>>>>>> sub_device_id, PCI_ANY_ID),
>>>>>>>>> DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
>>>>>>>>> + DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),
>>>>>>>>
>>>>>>>> Just a side note, device options are a headache, users are prone to get
>>>>>>>> them wrong and minimally it requires an entire round to get libvirt
>>>>>>>> support. We should be able to detect from the device or vfio API
>>>>>>>> whether such a call is required. Obviously if we can use the existing
>>>>>>>> kvm-vfio device, that's the better option anyway. Thanks,
>>>>>>>
>>>>>>> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
>>>>>>> does, it needs to produce a device failure when unavailable. Thanks,
>>>>>>>
>>>>>>
>>>>>> Also, I would like to see this as an generic feature instead of
>>>>>> kvmgt specific interface, so we don't have to add new options to QEMU
>>>>>> and it is
>>>>>> up to the vendor driver to proceed with or without it.
>>>>>
>>>>> In general this should be decided by lack of some required feature
>>>>> exclusively provided by KVM. I would not want to add a generic opt-out
>>>>> for mdev vendor drivers to decide that they arbitrarily want to disable
>>>>> that path. Thanks,
>>>>
>>>> IIUC, you are suggesting that this path should be controlled by KVM
>>>> feature cap
>>>> and it will be accessible to VFIO users when such checking is satisfied.
>>>
>>> Maybe we're getting too loose with our pronouns here, I'm starting to
>>> lose track of what "this" is referring to. I agree that there's no
>>> reason for the ioctl, as proposed to be kvmgt specific. I would hope
>>> that going through the kvm-vfio device to create that linkage would
>>> eliminate that, but we'll need to see what Jike can come up with to
>>> plumb between KVM and vfio. Vendor drivers can implement their own
>>> ioctls, now that we pass them through the mdev layer, but someone needs
>>> to call those ioctls. Ideally we want something programmatic to
>>> trigger that, without requiring a user to pass an extra device
>>> parameter. Additionally, if there is any hope of making use of the
>>> device with userspace drivers other than QEMU, hard dependencies on KVM
>>> should be avoided. Thanks,
>>>
>>> Alex
>>>
>>
>> Thanks for the advice, so I cooked another patch for your comments.
>> Basically a 'void *usrdata' is added to vfio_group, external users
>> can set it (kvm) or get it (kvm or other users like kvmgt).
>>
>> BTW, in device-model, the open method will return failure to vfio-mdev
>> in case that such kvm information is not available.
>>
>> --
>> Thanks,
>> Jike
>>
>>
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index d1d70e0..6b8d1d2 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -86,6 +86,7 @@ struct vfio_group {
>> struct mutex unbound_lock;
>> atomic_t opened;
>> bool noiommu;
>> + void *usrdata;
>> };
>>
>> struct vfio_device {
>> @@ -447,14 +448,13 @@ static struct vfio_group *vfio_group_try_get(struct
>> vfio_group *group)
>> }
>>
>> static
>> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
>> *iommu_group)
>> +struct vfio_group *__vfio_group_get_from_iommu(struct iommu_group
>> *iommu_group)
>> {
>> struct vfio_group *group;
>>
>> mutex_lock(&vfio.group_lock);
>> list_for_each_entry(group, &vfio.group_list, vfio_next) {
>> if (group->iommu_group == iommu_group) {
>> - vfio_group_get(group);
>
> This is wrong, we can't add our reference after we release the lock.
>
Thanks for pointing it out :)
>> mutex_unlock(&vfio.group_lock);
>> return group;
>> }
>> @@ -464,6 +464,17 @@ struct vfio_group *vfio_group_get_from_iommu(struct
>> iommu_group *iommu_group)
>> return NULL;
>> }
>>
>> +static
>> +struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
>> *iommu_group)
>> +{
>> + struct vfio_group *group = __vfio_group_get_from_iommu(iommu_group);
>> + if (!group)
>> + return NULL;
>> +
>> + vfio_group_get(group);
>
> We have no basis to get a reference here. This function cannot exist
> separate from the existing function above.
>
>> + return group;
>> +}
>> +
>> static struct vfio_group *vfio_group_get_from_minor(int minor)
>> {
>> struct vfio_group *group;
>> @@ -1728,6 +1739,31 @@ long vfio_external_check_extension(struct vfio_group
>> *group, unsigned long arg)
>> }
>> EXPORT_SYMBOL_GPL(vfio_external_check_extension);
>>
>> +void vfio_group_set_usrdata(struct vfio_group *group, void *data)
>> +{
>> + group->usrdata = data;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>> +
>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>> +{
>> + return group->usrdata;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>> +
>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>> +{
>> + struct vfio_group *vfio_group;
>> +
>> + vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>
> We actually need to use iommu_group_get() here. Kirti adds a
> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>
>> + if (!vfio_group)
>> + return NULL;
>> +
>> + return vfio_group_get_usrdata(vfio_group);
>
> This operates on a group for which we have no reference.
Great to know Kirti's work! BTW, this means user need to
call vfio_group_put_external_user afterwards, right?
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata_by_device);
>> +
>> +
>> /**
>> * Sub-module support
>> */
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index 0ecae0b..712588f 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -91,6 +91,10 @@ extern void vfio_unregister_iommu_driver(
>> extern int vfio_external_user_iommu_id(struct vfio_group *group);
>> extern long vfio_external_check_extension(struct vfio_group *group,
>> unsigned long arg);
>> +extern void vfio_group_set_usrdata(struct vfio_group *group, void *data);
>> +extern void *vfio_group_get_usrdata(struct vfio_group *group);
>> +extern void *vfio_group_get_usrdata_by_device(struct device *dev);
>> +
>>
>> /*
>> * Sub-module helpers
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 1dd087d..e00d401 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -60,6 +60,20 @@ static void kvm_vfio_group_put_external_user(struct
>> vfio_group *vfio_group)
>> symbol_put(vfio_group_put_external_user);
>> }
>>
>> +static void kvm_vfio_group_set_kvm(struct vfio_group *group, void *kvm)
>> +{
>> + void (*fn)(struct vfio_group *, void *);
>> +
>> + fn = symbol_get(vfio_group_set_usrdata);
>> + if (!fn)
>> + return;
>> +
>> + fn(group, kvm);
>> + kvm_get_kvm(kvm);
>> +
>> + symbol_put(vfio_group_set_usrdata);
>> +}
>> +
>> static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>> {
>> long (*fn)(struct vfio_group *, unsigned long);
>> @@ -161,6 +175,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev,
>> long attr, u64 arg)
>>
>> kvm_vfio_update_coherency(dev);
>>
>> + kvm_vfio_group_set_kvm(vfio_group, dev->kvm);
>> +
>> return 0;
>>
>> case KVM_DEV_VFIO_GROUP_DEL:
>> @@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev,
>> long attr, u64 arg)
>>
>> kvm_vfio_update_coherency(dev);
>>
>> + kvm_put_kvm(dev->kvm);
>> +
>> return ret;
>> }
>
> How does anyone get'ing the usrdata know what it contains?
Currently only the KVM instance. Maybe we can add other data along with
flags in the future?
> Does the
> vendor driver compare it to a pointer it found elsewhere? How does the
> vendor driver generate an error back to the user if this linkage is
> necessary but unavailable?
For the data == kvm scenario, yes, I think it's only valid to use it
inside the kvm thread context, IIUC, comparing kvm->mm with current->mm
does the trick. If not equal, in our case, the parent_ops->open()
will get an -ESRCH indicating that this mdev must be used along with KVM.
--
Thanks,
Jike
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, (continued)
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Paolo Bonzini, 2016/10/14
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Alex Williamson, 2016/10/14
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Alex Williamson, 2016/10/14
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Neo Jia, 2016/10/14
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Alex Williamson, 2016/10/14
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Neo Jia, 2016/10/14
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Alex Williamson, 2016/10/17
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Jike Song, 2016/10/18
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Alex Williamson, 2016/10/18
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot,
Jike Song <=
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Xiao Guangrong, 2016/10/19
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Paolo Bonzini, 2016/10/19
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Xiao Guangrong, 2016/10/19
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Paolo Bonzini, 2016/10/19
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Xiao Guangrong, 2016/10/19
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Paolo Bonzini, 2016/10/20
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Xiao, Guangrong, 2016/10/20
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Jike Song, 2016/10/20
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Jike Song, 2016/10/26
- Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot, Paolo Bonzini, 2016/10/26