qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] memory: make ram device read/write endian sensi


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [PATCH] memory: make ram device read/write endian sensitive
Date: Mon, 27 Feb 2017 14:25:11 +1100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

On 27/02/17 13:25, Michael Roth wrote:
> Quoting Alexey Kardashevskiy (2017-02-22 22:20:25)
>> On 21/02/17 17:46, Yongji Xie wrote:
>>> At the moment ram device's memory regions are NATIVE_ENDIAN. This does
>>> not work on PPC64 because VFIO PCI device is little endian but PPC64
>>> always defines static macro TARGET_WORDS_BIGENDIAN.
>>>
>>> This fixes endianness for ram device the same way as it is done
>>> for VFIO region in commit 6758008e2c4e79fb6bd04fe8e7a41665fa583965.
>>>
>>> Signed-off-by: Yongji Xie <address@hidden>
>>> ---
>>>  memory.c |   14 +++++++-------
>>>  1 files changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/memory.c b/memory.c
>>> index 6c58373..1ccb99f 100644
>>> --- a/memory.c
>>> +++ b/memory.c
>>> @@ -1139,13 +1139,13 @@ static uint64_t memory_region_ram_device_read(void 
>>> *opaque,
>>>          data = *(uint8_t *)(mr->ram_block->host + addr);
>>>          break;
>>>      case 2:
>>> -        data = *(uint16_t *)(mr->ram_block->host + addr);
>>> +        data = le16_to_cpu(*(uint16_t *)(mr->ram_block->host + addr));
>>>          break;
>>>      case 4:
>>> -        data = *(uint32_t *)(mr->ram_block->host + addr);
>>> +        data = le32_to_cpu(*(uint32_t *)(mr->ram_block->host + addr));
>>>          break;
>>>      case 8:
>>> -        data = *(uint64_t *)(mr->ram_block->host + addr);
>>> +        data = le64_to_cpu(*(uint64_t *)(mr->ram_block->host + addr));
>>>          break;
>>>      }
>>>  
>>> @@ -1166,13 +1166,13 @@ static void memory_region_ram_device_write(void 
>>> *opaque, hwaddr addr,
>>>          *(uint8_t *)(mr->ram_block->host + addr) = (uint8_t)data;
>>>          break;
>>>      case 2:
>>> -        *(uint16_t *)(mr->ram_block->host + addr) = (uint16_t)data;
>>> +        *(uint16_t *)(mr->ram_block->host + addr) = 
>>> cpu_to_le16((uint16_t)data);
>>>          break;
>>>      case 4:
>>> -        *(uint32_t *)(mr->ram_block->host + addr) = (uint32_t)data;
>>> +        *(uint32_t *)(mr->ram_block->host + addr) = 
>>> cpu_to_le32((uint32_t)data);
>>>          break;
>>>      case 8:
>>> -        *(uint64_t *)(mr->ram_block->host + addr) = data;
>>> +        *(uint64_t *)(mr->ram_block->host + addr) = cpu_to_le64(data);
>>>          break;
>>>      }
>>>  }
>>> @@ -1180,7 +1180,7 @@ static void memory_region_ram_device_write(void 
>>> *opaque, hwaddr addr,
>>>  static const MemoryRegionOps ram_device_mem_ops = {
>>>      .read = memory_region_ram_device_read,
>>>      .write = memory_region_ram_device_write,
>>> -    .endianness = DEVICE_NATIVE_ENDIAN,
>>> +    .endianness = DEVICE_LITTLE_ENDIAN,
>>>      .valid = {
>>>          .min_access_size = 1,
>>>          .max_access_size = 8,
>>>
>>         
>>
>> I did some debugging today.
>>
>> First, Paolo is right and ram_device_mem_ops::endianness should be
>> host-endian which happens to be little in our test case (ppc64le) so
>> changes to .read/.write are actually no-op (I believe so, have not checked).
>>
>>
>> But I was wondering why this gets executed at all.
>>
>> The test case is:
>>
>> qemu-system-ppc64 ...
>> -device "vfio-pci,id=vfio0001_03_00_0,host=0001:03:00.0
>> -drive id=DRIVE0,if=none,file=./test.qcow2,format=qcow2
>> -device virtio-blk-pci,id=vblk0,drive=DRIVE0
>>
>> The host kernel is v4.10, ppc64le (little endian), 64K system page size,
>> QEMU is v2.8.0.
>>
>> When this boots, lspci shows:
>>
>> 00:00.0 Ethernet controller: Chelsio Communications Inc T310 10GbE Single
>> Port Adapter
>>         Subsystem: IBM Device 038c
>>         Flags: bus master, fast devsel, latency 0, IRQ 18
>>         Memory at 210000004000 (64-bit, non-prefetchable) [size=4K]
>>         Memory at 210000800000 (64-bit, non-prefetchable) [size=8M]
>>         Memory at 210001000000 (64-bit, non-prefetchable) [size=4K]
>>         Expansion ROM at 2000c0080000 [disabled] [size=512K]
>>         Capabilities: [40] Power Management version 3
>>         Capabilities: [48] MSI: Enable- Count=1/32 Maskable- 64bit+
>>         Capabilities: [58] Express Endpoint, MSI 00
>>         Capabilities: [94] Vital Product Data
>>         Capabilities: [9c] MSI-X: Enable+ Count=32 Masked-
>>         Kernel driver in use: cxgb3
>>
>> 00:01.0 SCSI storage controller: Red Hat, Inc Virtio block device
>>         Subsystem: Red Hat, Inc Device 0002
>>         Physical Slot: C16
>>         Flags: bus master, fast devsel, latency 0, IRQ 17
>>         I/O ports at 0040 [size=64]
>>         Memory at 2000c0000000 (32-bit, non-prefetchable) [size=4K]
>>         Memory at 210000000000 (64-bit, prefetchable) [size=16K]
>>         Capabilities: [98] MSI-X: Enable+ Count=2 Masked-
>>         Capabilities: [84] Vendor Specific Information: Len=14 <?>
>>         Capabilities: [70] Vendor Specific Information: Len=14 <?>
>>         Capabilities: [60] Vendor Specific Information: Len=10 <?>
>>         Capabilities: [50] Vendor Specific Information: Len=10 <?>
>>         Capabilities: [40] Vendor Specific Information: Len=10 <?>
>>         Kernel driver in use: virtio-pci
>>
>>
>> As we can see, BAR0 of Chelsio is 210000004000 - not aligned (it should
>> have been aligned but it is not - this is another bug, in QEMU). Normally
> 
> I think SLOF is the culprit in this case. The patch below enforces
> 64k alignment for mmio/mem regions at boot-time. I'm not sure if this
> should be done for all devices, or limited specifically to VFIO though
> (controlled perhaps via a per-device property? or at the machine-level
> at least?):


I was sure we have this in SLOF and now I see that we don't. Hm :)



> https://github.com/mdroth/SLOF/commit/6ff6827740aba39d8db9bebcc3ae069bdf934d06
> 
> I've only sniff-tested it with virtio so far. Does this fix things
> for Chelsio?
> 
> aligned (enabled):
>   Bus  0, device   2, function 0:
>     Ethernet controller: PCI device 1af4:1000
>       IRQ 0.
>       BAR0: I/O at 0x0040 [0x005f].
>       BAR1: 32 bit memory at 0xc0080000 [0xc0080fff].
>       BAR4: 64 bit prefetchable memory at 0x210000010000 [0x210000013fff].
>       BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
>       id ""
>   Bus  0, device   3, function 0:
>     Ethernet controller: PCI device 1af4:1000
>       IRQ 0.
>       BAR0: I/O at 0x0020 [0x003f].
>       BAR1: 32 bit memory at 0xc0000000 [0xc0000fff].
>       BAR4: 64 bit prefetchable memory at 0x210000000000 [0x210000003fff].
>       BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
>       id ""
> 
> aligned (disabled):
>   Bus  0, device   2, function 0:
>     Ethernet controller: PCI device 1af4:1000
>       IRQ 0.
>       BAR0: I/O at 0x0040 [0x005f].
>       BAR1: 32 bit memory at 0xc0080000 [0xc0080fff].
>       BAR4: 64 bit prefetchable memory at 0x210000004000 [0x210000007fff].
>       BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
>       id ""
>   Bus  0, device   3, function 0:
>     Ethernet controller: PCI device 1af4:1000
>       IRQ 0.
>       BAR0: I/O at 0x0020 [0x003f].
>       BAR1: 32 bit memory at 0xc0000000 [0xc0000fff].
>       BAR4: 64 bit prefetchable memory at 0x210000000000 [0x210000003fff].
>       BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
>       id ""
> 
> upstream:
>   Bus  0, device   2, function 0:
>     Ethernet controller: PCI device 1af4:1000
>       IRQ 0.
>       BAR0: I/O at 0x0040 [0x005f].
>       BAR1: 32 bit memory at 0xc0080000 [0xc0080fff].
>       BAR4: 64 bit prefetchable memory at 0x210000004000 [0x210000007fff].
>       BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
>       id ""
>   Bus  0, device   3, function 0:
>     Ethernet controller: PCI device 1af4:1000
>       IRQ 0.
>       BAR0: I/O at 0x0020 [0x003f].
>       BAR1: 32 bit memory at 0xc0000000 [0xc0000fff].
>       BAR4: 64 bit prefetchable memory at 0x210000000000 [0x210000003fff].
>       BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
>       id ""
> 
> 
>> such a BAR would be emulated by VFIO. However, since 95251725e335 "vfio:
>> Add support for mmapping sub-page MMIO BARs" we mmap such BARs to QEMU in a
>> hope they will be registered later via KVM_SET_USER_MEMORY_REGION - and
>> this fails as the guest address is not host page size aligned.
>>
>> So we end up having the following memory tree:
>>
>> memory-region: address@hidden
>>   0000000000000000-ffffffffffffffff (prio 0, RW): address@hidden
>>     00000000c0000000-00000000c0000fff (prio 1, RW): virtio-blk-pci-msix
>>       00000000c0000000-00000000c000001f (prio 0, RW): msix-table
>>       00000000c0000800-00000000c0000807 (prio 0, RW): msix-pba
>>     0000210000000000-0000210000003fff (prio 1, RW): virtio-pci
>>       0000210000000000-0000210000000fff (prio 0, RW): virtio-pci-common
>>       0000210000001000-0000210000001fff (prio 0, RW): virtio-pci-isr
>>       0000210000002000-0000210000002fff (prio 0, RW): virtio-pci-device
>>       0000210000003000-0000210000003fff (prio 0, RW): virtio-pci-notify
>>     0000210000004000-0000210000004fff (prio 1, RW): 0001:03:00.0 BAR 0
>>       0000210000004000-0000210000004fff (prio 0, RW): 0001:03:00.0 BAR 0
>> mmaps[0]
>>     0000210000800000-0000210000ffffff (prio 1, RW): 0001:03:00.0 BAR 2
>>       0000210000800000-0000210000ffffff (prio 0, RW): 0001:03:00.0 BAR 2
>> mmaps[0]
>>     0000210001000000-0000210001000fff (prio 1, RW): 0001:03:00.0 BAR 4
>>       0000210001000000-00002100010001ff (prio 0, RW): msix-table
>>       0000210001000800-0000210001000807 (prio 0, RW): msix-pba [disabled]
>>
>> The problem region which this patch is fixing is "0001:03:00.0 BAR 0
>> mmaps[0]". It is mmaped to QEMU and not registered in KVM. So when the
>> guest accesses this BAR, we trap into KVM, KVM passes it to QEMU and QEMU
>> read/writes this memory region.
>>
>> A simple hack like below fixes it - it basically removes mmap'd memory
>> region from the tree and MMIO starts being handled by the parent MR -
>> "0001:03:00.0 BAR 0" (which is "little endian" - vfio_region_ops).
>>
>>
>> I am wondering now what would be a correct approach here?
>>
>> Add/remove mmap'd MRs once we detect aligned/unaligned BARs?
>>
>> Keep things where they are in the VFIO department and just fix
>> ram_device_mem_ops::endianness?
> 
> VFIO tries to report similar scenarios in realize():
> 
>   vfio_bar_setup:
>     if (vfio_region_mmap(&bar->region)) {
>       error_report("Failed to mmap %s BAR %d. Performance may be slow",
>                    vdev->vbasedev.name, nr);
>     }
> 
> maybe we should at least be reporting it in this case as well? In our
> case we'll see it every reset/hotplug though, so maybe a trace makes
> more sense.


Tracepoint should be enough imho.


> Even with the patch for SLOF this would continue to be an
> issue for hotplug until BAR assignment is moved to QEMU for pseries,
> so would be good to have a simple way to check for it. Can send a patch
> if that makes sense.

Can BAR allocation be done by QEMU when hotplugging?


> 
>>
>> Thanks.
>>
>>
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index d7dbe0e3e0..0657a27623 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -1109,7 +1109,10 @@ static void
>> vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
>>      memory_region_transaction_begin();
>>
>>      memory_region_set_size(mr, size);
>> -    memory_region_set_size(mmap_mr, size);
>> +    if (bar_addr & qemu_real_host_page_mask)
>> +          memory_region_del_subregion(mr, mmap_mr);
>> +    else
>> +        memory_region_set_size(mmap_mr, size);
>>      if (size != region->size && memory_region_is_mapped(mr)) {
>>          memory_region_del_subregion(r->address_space, mr);
>>          memory_region_add_subregion_overlap(r->address_space,
>>
>>
>>
>> -- 
>> Alexey
>>
> 


-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]