qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
Date: Tue, 22 May 2012 17:01:15 +1000
User-agent: Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120327 Thunderbird/11.0.1

On 22/05/12 16:31, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 08:11, Alexey Kardashevskiy <address@hidden> wrote:
> 
>> On 22/05/12 15:52, Alexander Graf wrote:
>>>
>>>
>>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <address@hidden> wrote:
>>>
>>>> On 22/05/12 13:21, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <address@hidden> wrote:
>>>>>
>>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>>>> Alexander,
>>>>>>>
>>>>>>> Is that any better? :)
>>>>>>
>>>>>> Alex (Graf that is), ping ?
>>>>>>
>>>>>> The original patch from Alexey was fine btw.
>>>>>>
>>>>>> VFIO will always call things with the existing capability offset so
>>>>>> there's no real risk of doing the wrong thing or break the list or
>>>>>> anything.
>>>>>>
>>>>>> IE. A small simple patch that addresses the problem :-)
>>>>>>
>>>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>>>> real issue and the simple patch does it neatly from what I can
>>>>>> understand.
>>>>>>
>>>>>> Cheers,
>>>>>> Ben.
>>>>>>
>>>>>>>
>>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>>>> * in pci config space */
>>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>                      uint8_t offset, uint8_t size)
>>>>>>> {
>>>>>>> -    uint8_t *config;
>>>>>>> +    uint8_t *config, existing;
>>>>>
>>>>> Existing is a pointer to the target dev's config space, right?
>>>>
>>>> Yes.
>>>>
>>>>>>>   int i, overlapping_cap;
>>>>>>>
>>>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>>>> +    if (existing) {
>>>>>>> +        if (offset && (existing != offset)) {
>>>>>>> +            return -EEXIST;
>>>>>>> +        }
>>>>>>> +        for (i = existing; i < size; ++i) {
>>>>>
>>>>> So how does this possibly make sense?
>>>>
>>>> Although I do not expect VFIO to add capabilities (does not make sense), I 
>>>> still want to double
>>>> check that this space has not been tried to use by someone else.
>>>
>>> i is an int. existing is a uint8_t*.
>>
>>
>> It was there before me. This function already does a loop and this is how it 
>> was coded at the first place.
> 
> Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...


Well it is still does not make much sense to have "int i" rather than "uint8_t 
i" :)


>>>>>>> +            if (pdev->used[i]) {
>>>>>>> +                return -EFAULT;
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +        memset(pdev->used + offset, 0xFF, size);
>>>>> Why?
>>>>
>>>> Because I am marking the space this capability takes as used.
>>>
>>> But if it already existed (at the same offset), it should be set used 
>>> already, no? Unless size > existing size, in which case you might overwrite 
>>> data in the next chunk, no?
>>
>>
>> No, it does not exist for VFIO - VFIO read the config space from the host 
>> kernel first and then calls msi_init or msix_init or whatever_else_init 
>> depending on what it got from the host kernel. And these xxx_init() 
>> functions eventually call pci_add_capability().
> So why would the function that populates the config space initially not set 
> the used flag correctly?


This is internal kitchen of PCIDevice which I do not want to touch from 
anywhere but pci.c. And
there is no "fixup_capability" or something.


>> Sure we can either implement own msi_init/msix_init (and may be others in 
>> the future) for VFIO (which would do all the same as other QEMU devices 
>> except touching the capabilities)  OR  hack msi_init/msix_init not to touch 
>> capabilities if they exist.
> No, calling the internal functions sounds fine to me. It's the step before 
> that irritates me. VFIO shouldn't differ too much from an emulated device wrt 
> its config space population really.


The last thing we want for a VFIO device is changing its capabilities list.


>>>>>>> +        /* Make capability read-only by default */
>>>>>>> +        memset(pdev->wmask + offset, 0, size);
>>>>> Why?
>>>>
>>>> Because the pci_add_capability() does it for a new capability by default.
>>>
>>> Hrm. So you're copying code? Can't you merge the overwrite and write cases?
>>
>> I am trying to make it as a single chunk which is as small as possible.
> 
> No, you're needlessly duplicating code which is a bad idea :). Please reuse 
> as much of the existing function as possible, unless it really doesn't make 
> sense.

I actually duplicated 4 (four) lines and did it just once. This is too little 
to be called
"duplicating" :) And I get very special case visually separated and easy to 
remove if we find a
better solution later.

But - no problemo, I'll rework it.


[no further comments]



>> If it helps, below is the same patch with extended context to see what is 
>> going on in that function.
>>
>>
>>
>>
>>
>>
>> hw/pci.c |   20 +++++++++++++++++++-
>> 1 files changed, 19 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/pci.c b/hw/pci.c
>> index 63a8219..7008a42 100644
>> --- a/hw/pci.c
>> +++ b/hw/pci.c
>> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
>> is_default_rom)
>>     ptr = memory_region_get_ram_ptr(&pdev->rom);
>>     load_image(path, ptr);
>>     g_free(path);
>>
>>     if (is_default_rom) {
>>         /* Only the default rom images will be patched (if needed). */
>>         pci_patch_ids(pdev, ptr, size);
>>     }
>>
>>     qemu_put_ram_ptr(ptr);
>>
>>     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
>>
>>     return 0;
>> }
>>
>> static void pci_del_option_rom(PCIDevice *pdev)
>> {
>>     if (!pdev->has_rom)
>>         return;
>>
>>     vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
>>     memory_region_destroy(&pdev->rom);
>>     pdev->has_rom = false;
>> }
>>
>> /*
>>  * if !offset
>>  * Reserve space and add capability to the linked list in pci config space
>>  *
>>  * if offset = 0,
>>  * Find and reserve space and add capability to the linked list
>>  * in pci config space */
>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>                        uint8_t offset, uint8_t size)
>> {
>> -    uint8_t *config;
>> +    uint8_t *config, existing;
>>     int i, overlapping_cap;
>>
>> +    existing = pci_find_capability(pdev, cap_id);
>> +    if (existing) {
>> +        if (offset && (existing != offset)) {
>> +            return -EEXIST;
>> +        }
>> +        for (i = existing; i < size; ++i) {
>> +            if (pdev->used[i]) {
>> +                return -EFAULT;
>> +            }
>> +        }
> 
> }
> 
>> +        memset(pdev->used + offset, 0xFF, size);
>> +        /* Make capability read-only by default */
>> +        memset(pdev->wmask + offset, 0, size);
>> +        /* Check capability by default */
>> +        memset(pdev->cmask + offset, 0xFF, size);
>> +        return existing;
>> +    }
>> +
>>     if (!offset) {
> 
> && !existing maybe?
> 
>>         offset = pci_find_space(pdev, size);
>>         if (!offset) {
>>             return -ENOSPC;
>>         }
>>     } else {
>>         /* Verify that capabilities don't overlap.  Note: device assignment
>>          * depends on this check to verify that the device is not broken.
>>          * Should never trigger for emulated devices, but it's helpful
>>          * for debugging these. */
>>         for (i = offset; i < offset + size; i++) {
>>             overlapping_cap = pci_find_capability_at_offset(pdev, i);
>>             if (overlapping_cap) {
>>                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
>>                         "Attempt to add PCI capability %x at offset "
>>                         "%x overlaps existing capability %x at offset %x\n",
>>                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
>>                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
>>                         cap_id, offset, overlapping_cap, i);
>>                 return -EINVAL;
>>             }
>>         }
>>     }
>>
> 
> If (!existing) {
> 
>>     config = pdev->config + offset;
>>     config[PCI_CAP_LIST_ID] = cap_id;
>>     config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>     pdev->config[PCI_CAPABILITY_LIST] = offset;
>>     pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> 
> }
> 
> which poses the question why the above wouldn't apply for the existing case. 
> It would work just as well to leave that in, no?
> 
> Alex
> 
>>     memset(pdev->used + offset, 0xFF, size);
>>     /* Make capability read-only by default */
>>     memset(pdev->wmask + offset, 0, size);
>>     /* Check capability by default */
>>     memset(pdev->cmask + offset, 0xFF, size);
>>     return offset;
>> }
>>
>>
>>
>>
>>>>>>> +        /* Check capability by default */
>>>>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>>>>
>>>>> I don't understand this part either.
>>>>
>>>> The pci_add_capability() does it for a new capability by default.
>>>>
>>>>
>>>>
>>>>>
>>>>> Alex
>>>>>
>>>>>>> +        return existing;
>>>>>>> +    }
>>>>>>> +
>>>>>>>   if (!offset) {
>>>>>>>       offset = pci_find_space(pdev, size);
>>>>>>>       if (!offset) {
>>>>>>>           return -ENOSPC;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>>>>>
>>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;:
>>>>>>>>>>>
>>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>>>>> is being built by QEMU.
>>>>>>>>>>>>
>>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>>>>>
>>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>>>>> 0x34: 0xC8
>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>
>>>>>>>>>>>> after:
>>>>>>>>>>>> 0x34: 0xD0
>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>
>>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>>>>>
>>>>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>>>>> the same type capability is about to add.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>>>>>>>>> ---
>>>>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>>>>> --- a/hw/pci.c
>>>>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, 
>>>>>>>>>>>> uint8_t cap_id,
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> config = pdev->config + offset;
>>>>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>>>>>
>>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to 
>>>>>>>>>>> do a loop through all capabilities, check if the one you want to 
>>>>>>>>>>> add is there already and if so either
>>>>>>>>>>> * replace the existing one or
>>>>>>>>>>> * drop out and not write the new one in.
>>>>>>>>>
>>>>>>>>> * hw_error :)
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure which way would be more natural.
>>>>>>>>>>
>>>>>>>>>> There is a third option - add another function, lets call it
>>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() 
>>>>>>>>>> does
>>>>>>>>>> but won't touch list pointers.
>>>>>>>>>
>>>>>>>>> What good is a function that breaks internal consistency?
>>>>>>>>
>>>>>>>>
>>>>>>>> It is broken already by having PCIDevice.used field. Normally 
>>>>>>>> pci_add_capability() would go through
>>>>>>>> the whole list and add a capability if it does not exist. Emulated 
>>>>>>>> devices which care about having a
>>>>>>>> capability at some fixed offset would have initialized their config 
>>>>>>>> space before calling this
>>>>>>>> capabilities API (as VFIO does).
>>>>>>>>
>>>>>>>> If we really want to support emulated devices which want some 
>>>>>>>> capabilities be at fixed offset and
>>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad 
>>>>>>>> to restore this consistency
>>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at 
>>>>>>>> different location as a guest
>>>>>>>> driver may care about its offset.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>>>>> pci_add_capability() based on this knowledge so doing additional 
>>>>>>>>>> loops
>>>>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>>>>>
>>>>>>>>> Not sure I understand your proposal. The more generic a framework is, 
>>>>>>>>> the better, no? In this code path we don't care about speed. We only 
>>>>>>>>> care about consistency and reliability.




-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]