qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/3] VFIO driver for mediated PCI device


From: Kirti Wankhede
Subject: Re: [Qemu-devel] [PATCH 2/3] VFIO driver for mediated PCI device
Date: Sat, 25 Jun 2016 00:04:27 +0530

Thanks Alex.


On 6/22/2016 4:18 AM, Alex Williamson wrote:
> On Mon, 20 Jun 2016 22:01:47 +0530
> Kirti Wankhede <address@hidden> wrote:
> 
>> +
>> +static int get_mdev_region_info(struct mdev_device *mdev,
>> +                            struct pci_region_info *vfio_region_info,
>> +                            int index)
>> +{
>> +    int ret = -EINVAL;
>> +    struct parent_device *parent = mdev->parent;
>> +
>> +    if (parent && dev_is_pci(parent->dev) && parent->ops->get_region_info) {
>> +            mutex_lock(&mdev->ops_lock);
>> +            ret = parent->ops->get_region_info(mdev, index,
>> +                                                vfio_region_info);
>> +            mutex_unlock(&mdev->ops_lock);
> 
> Why do we have two ops_lock, one on the parent_device and one on the
> mdev_device?!  Is this one actually locking anything or also just
> providing serialization?  Why do some things get serialized at the
> parent level and some things at the device level?  Very confused by
> ops_lock.
>

There are two sets of callback:
* parent device callbacks: supported_config, create, destroy, start, stop
* mdev device callbacks: read, write, set_irqs, get_region_info,
validate_map_request

parent->ops_lock is to serialize per parent device callbacks.
mdev->ops_lock is to serialize per mdev device callbacks.

I'll add above comment in mdev.h.


>> +
>> +static int mdev_get_irq_count(struct vfio_mdev *vmdev, int irq_type)
>> +{
>> +    /* Don't support MSIX for now */
>> +    if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX)
>> +            return -1;
>> +
>> +    return 1;
> 
> Too much hard coding here, the mediated driver should define this.
> 

I'm testing INTX and MSI, I don't have a way to test MSIX for now. So we
thought we can add supported for MSIX later. Till then hard code it to 1.

>> +
>> +            if (parent && parent->ops->set_irqs) {
>> +                    mutex_lock(&mdev->ops_lock);
>> +                    ret = parent->ops->set_irqs(mdev, hdr.flags, hdr.index,
>> +                                                hdr.start, hdr.count, data);
>> +                    mutex_unlock(&mdev->ops_lock);
> 
> Device level serialization on set_irqs... interesting.
> 

Hope answer above helps to clarify this.


>> +            }
>> +
>> +            kfree(ptr);
>> +            return ret;
>> +    }
>> +    }
>> +    return -ENOTTY;
>> +}
>> +
>> +ssize_t mdev_dev_config_rw(struct vfio_mdev *vmdev, char __user *buf,
>> +                       size_t count, loff_t *ppos, bool iswrite)
>> +{
>> +    struct mdev_device *mdev = vmdev->mdev;
>> +    struct parent_device *parent = mdev->parent;
>> +    int size = vmdev->vfio_region_info[VFIO_PCI_CONFIG_REGION_INDEX].size;
>> +    int ret = 0;
>> +    uint64_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +
>> +    if (pos < 0 || pos >= size ||
>> +        pos + count > size) {
>> +            pr_err("%s pos 0x%llx out of range\n", __func__, pos);
>> +            ret = -EFAULT;
>> +            goto config_rw_exit;
>> +    }
>> +
>> +    if (iswrite) {
>> +            char *usr_data, *ptr;
>> +
>> +            ptr = usr_data = memdup_user(buf, count);
>> +            if (IS_ERR(usr_data)) {
>> +                    ret = PTR_ERR(usr_data);
>> +                    goto config_rw_exit;
>> +            }
>> +
>> +            ret = parent->ops->write(mdev, usr_data, count,
>> +                                      EMUL_CONFIG_SPACE, pos);
> 
> No serialization on this ops, thank goodness, but why?
>

Its there at caller of mdev_dev_rw().


> This read/write interface still seems strange to me...
> 

Replied on this in 1st Patch.

>> +
>> +            memcpy((void *)(vmdev->vconfig + pos), (void *)usr_data, count);
>> +            kfree(ptr);
>> +    } else {
>> +            char *ret_data, *ptr;
>> +
>> +            ptr = ret_data = kzalloc(count, GFP_KERNEL);
>> +
>> +            if (IS_ERR(ret_data)) {
>> +                    ret = PTR_ERR(ret_data);
>> +                    goto config_rw_exit;
>> +            }
>> +
>> +            ret = parent->ops->read(mdev, ret_data, count,
>> +                                    EMUL_CONFIG_SPACE, pos);
>> +
>> +            if (ret > 0) {
>> +                    if (copy_to_user(buf, ret_data, ret))
>> +                            ret = -EFAULT;
>> +                    else
>> +                            memcpy((void *)(vmdev->vconfig + pos),
>> +                                    (void *)ret_data, count);
>> +            }
>> +            kfree(ptr);
> 
> So vconfig caches all of config space for the mdev, but we only ever
> use it to read the BAR address via mdev_read_base()... why?  I hope the
> mdev driver doesn't freak out if the user reads the mmio region before
> writing a base address (remember the vfio API aspect of the interface
> doesn't necessarily follow the VM PCI programming API)
> 

How could user read mmio region from guest before writing base address?
Isn't that would be a bug?
>From our driver if pos is not within the base address range, then we
return error for read/write.


>> +    }
>> +config_rw_exit:
>> +
>> +    if (ret > 0)
>> +            *ppos += ret;
>> +
>> +    return ret;
>> +}
>> +
>> +ssize_t mdev_dev_bar_rw(struct vfio_mdev *vmdev, char __user *buf,
>> +                    size_t count, loff_t *ppos, bool iswrite)
>> +{
>> +    struct mdev_device *mdev = vmdev->mdev;
>> +    struct parent_device *parent = mdev->parent;
>> +    loff_t offset = *ppos & VFIO_PCI_OFFSET_MASK;
>> +    loff_t pos;
>> +    int bar_index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +    int ret = 0;
>> +
>> +    if (!vmdev->vfio_region_info[bar_index].start)
>> +            mdev_read_base(vmdev);
>> +
>> +    if (offset >= vmdev->vfio_region_info[bar_index].size) {
>> +            ret = -EINVAL;
>> +            goto bar_rw_exit;
>> +    }
>> +
>> +    count = min(count,
>> +                (size_t)(vmdev->vfio_region_info[bar_index].size - offset));
>> +
>> +    pos = vmdev->vfio_region_info[bar_index].start + offset;
> 
> In the case of a mpci dev, @start is the vconfig BAR value, so it's
> user (guest) writable, and the mediated driver is supposed to
> understand that?  I suppose is saw the config write too, if there was
> one, but the mediated driver gives us region info based on region index.
> We have the region index here.  Why wouldn't we do reads and writes
> based on region index and offset and eliminate vconfig?  Seems like
> that would consolidate a lot of this, we don't care what we're reading
> and writing, just pass it through.  Mediated pci drivers would simply
> need to match indexes to those already defined for vfio-pci.
> 

Ok, looking at it. so this will remove vconfig completely.

Thanks,
Kirti.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]