qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3 04/19] Add device IO ops vector


From: John Johnson
Subject: Re: [RFC v3 04/19] Add device IO ops vector
Date: Tue, 7 Dec 2021 07:48:33 +0000


> On Nov 19, 2021, at 2:42 PM, Alex Williamson <alex.williamson@redhat.com> 
> wrote:
> 
> On Mon,  8 Nov 2021 16:46:32 -0800
> John Johnson <john.g.johnson@oracle.com> wrote:
> 
>> 
>> +    int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t 
>> size,
>> +                        void *data, bool post);
> 
> The @post arg is not really developed in this patch, it would make
> review easier to add the arg when it becomes implemented and used.
> 

        ok


>> 
>> @@ -2382,6 +2398,21 @@ int vfio_get_region_info(VFIODevice *vbasedev, int 
>> index,
>>                          struct vfio_region_info **info)
>> {
>>     size_t argsz = sizeof(struct vfio_region_info);
>> +    int fd = -1;
>> +    int ret;
>> +
>> +    /* create region cache */
>> +    if (vbasedev->regions == NULL) {
>> +        vbasedev->regions = g_new0(struct vfio_region_info *,
>> +                                   vbasedev->num_regions);
>> +    }
> 
> Seems like this should have been part of vfio_get_device() since that's
> where we set num_regions.
> 
>> +    /* check cache */
>> +    if (vbasedev->regions[index] != NULL) {
>> +        *info = g_malloc0(vbasedev->regions[index]->argsz);
>> +        memcpy(*info, vbasedev->regions[index],
>> +               vbasedev->regions[index]->argsz);
>> +        return 0;
>> +    }
> 
> Why are we calling vfio_get_region_info() enough that we need a cache?
> This looks like an optimization for something that doesn't exist yet.
> And if we have a cache, why aren't callers using it directly rather
> than creating a copy for each caller?
> 

        The are a couple types of vfio_get_region_info() callers.
One type wants a specific region, which it then uses to fill in
it’s own data structures (e.g., vfio_region_setup()) and the other
type uses vfio_get_dev_region_info() to loop through all regions
looking for a specific one (e.g., for the migration region or for
a specifc device to setup its quirks)  On top of that, each
vfio_get_region_info() will need to call the server twice for regions
with capabilities: once to get the proper ‘argsz’ and once to actually
get the info.

        I thought just having a cache that gets filled once, then
satisfies all later callers was a cleaner idea.  It also solved the
issue I had that the FD returned isn't used until the container is
setup, which would’ve needed another vfio_get_region_info() call.

        I can certainly move the cache fill to vfio_get_device()
and remove the copies if you like that idea better.



>> 
>>     *info = g_malloc0(argsz);
>> 
>> @@ -2389,19 +2420,28 @@ int vfio_get_region_info(VFIODevice *vbasedev, int 
>> index,
>> retry:
>>     (*info)->argsz = argsz;
>> 
>> -    if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
>> +    ret = VDEV_GET_REGION_INFO(vbasedev, *info, &fd);
>> +    if (ret != 0) {
>>         g_free(*info);
>>         *info = NULL;
>> -        return -errno;
>> +        return ret;
>>     }
>> 
>>     if ((*info)->argsz > argsz) {
>>         argsz = (*info)->argsz;
>>         *info = g_realloc(*info, argsz);
>> +        if (fd != -1) {
>> +            close(fd);
>> +            fd = -1;
>> +        }
> 
> Use of fd here is hard to review, it's clearly for some future use case.
> 

        The FD code can be moved to a patch where it’s used.


>> 
>> +
>> +static int vfio_io_region_write(VFIODevice *vbasedev, uint8_t index, off_t 
>> off,
>> +                                uint32_t size, void *data, bool post)
>> +{
>> +    struct vfio_region_info *info = vbasedev->regions[index];
>> +    int ret;
>> +
>> +    ret = pwrite(vbasedev->fd, data, size, info->offset + off);
>> +    if (ret < 0) {
>> +        ret = -errno;
>> +    }
>> +
>> +    return ret;
>> +}
> 
> Simplify all these with ternaries?
> 
>    return ret < 0 ? -errno : ret;
> 

        ok



>> 
>> static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>> {
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>     uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
>> -    off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
>>     DeviceState *dev = DEVICE(vdev);
>>     char *name;
>> -    int fd = vdev->vbasedev.fd;
>> 
>>     if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
>>         /* Since pci handles romfile, just print a message and return */
>> @@ -926,13 +928,23 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>>      * Use the same size ROM BAR as the physical device.  The contents
>>      * will get filled in later when the guest tries to read it.
>>      */
>> -    if (pread(fd, &orig, 4, offset) != 4 ||
>> -        pwrite(fd, &size, 4, offset) != 4 ||
>> -        pread(fd, &size, 4, offset) != 4 ||
>> -        pwrite(fd, &orig, 4, offset) != 4) {
>> -        error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
>> +#define rom_read(var) VDEV_REGION_READ(vbasedev,  \
>> +                                       VFIO_PCI_CONFIG_REGION_INDEX,  \
>> +                                       PCI_ROM_ADDRESS, 4, (var))
>> +#define rom_write(var) VDEV_REGION_WRITE(vbasedev,  \
>> +                                         VFIO_PCI_CONFIG_REGION_INDEX,  \
>> +                                         PCI_ROM_ADDRESS, 4, (var), false)
>> +
>> +    if (rom_read(&orig) != 4 ||
>> +        rom_write(&size) != 4 ||
>> +        rom_read(&size) != 4 ||
>> +        rom_write(&orig) != 4) {
>> +
>> +        error_report("%s(%s) ROM access failed", __func__, vbasedev->name);
>>         return;
>>     }
>> +#undef rom_read
>> +#undef rom_write
> 
> Hmm, I think I'd rather see a generic function broken out for this,
> maybe:
> 
> int vfio_pci_size_bar32(VFIODevice *vbasedev, int offset,
>                        uint32_t mask, uint32_t *size)
> {
>    uint32_t orig, val;
>    int index = VFIO_PCI_CONFIG_REGION_INDEX;
> 
>    if (VDEV_REGION_READ(vbasedev, index, offset, 4, &orig) != 4 ||
>        VDEV_REGION_WRITE(vbasedev, index, offset, 4, &mask, false) != 4 ||
>        VDEV_REGION_READ(vbasedev, index, offset, 4, &val) != 4 ||
>        VDEV_REGION_WRITE(vbasedev, index, offset, 4, &orig, false) != 4) {
> 
>        error_report(...
> 
>        return -1;
>    }
> 
>    *size = ~(le32_to_cpu(val) & PCI_ROM_ADDRESS_MASK) + 1;
> 
>    return 0;
> }
> 

        ok


> Really I think it's just trying to hard code
> VFIO_PCI_CONFIG_REGION_INDEX and PCI_ROM_ADDRESS into the macro calls
> that demand some simplification, if we use variables it looks fine
> inline.
> 
> We could also re-wrap VDEV_REGION_READ/WRITE in vfio/pci.h to simplify
> config space access, presumably we'd never have posted writes to config
> space.  Thanks,
> 


        There are multiple places where config space is read or written
in pci.c, so adding an inline or #define for it is a good idea.

                                                                JJ




reply via email to

[Prev in Thread] Current Thread [Next in Thread]