qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port


From: Michael S. Tsirkin
Subject: Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port
Date: Fri, 21 Apr 2023 04:22:23 -0400

On Thu, Apr 20, 2023 at 05:38:39PM +0200, robin@streamhpc.com wrote:
> From: Robin Voetter <robin@streamhpc.com>
> 
> The ROCm driver for Linux uses PCIe atomics to schedule work and
> generally communicate between the host and the device.  This does not
> currently work in QEMU with regular vfio-pci passthrough, because the
> pcie-root-port does not advertise the PCIe atomic completer
> capabilities.  When initializing the GPU from the Linux driver, it
> queries whether the PCIe connection from the CPU to GPU supports the
> required capabilities[1] in the pci_enable_atomic_ops_to_root
> function[2].  Currently the only part where this fails is checking the
> atomic completer capabilities (32 and 64 bits) on the root port[3].  In
> this case, the driver determines that PCIe atomics are not supported at
> all, and this causes ROCm programs to misbehave.  (While AMD advertises
> that there is some support for ROCm without PCIe atomics, I have never
> actually gotten that working...)
> 
> This patch allows ROCm to properly function by introducing an
> additional experimental property to the pcie-root-port,
> x-atomic-completion.

so what exactly makes it experimental? from this description
it looks like it actually has to be enabled for things to work?
Also pls CC alex on whether this is a correct way to do it.

>  Setting this option makes the port report
> support for the PCI_EXP_DEVCAP2_ATOMIC_COMP32 and COMP64
> capabilities.  This then makes the check from [3] pass, and
> everything seems to work appropriately after that.
> 
> To verify that the capabilities are reported correctly, one can use
> lspci to check the capabilities of the root port: lspci -vvv -s <root
> port id> should show 32bit+ and 64bit+ capabilities in DevCap2 when
> x-atomic-completion is enabled.  For example:
> 
>     -device pcie-root-port,x-atomic-completion=true,id.pcie.1
> 
> The output of lspci should include the following for the pcie root port:
> 
>     AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
> 
> To verify that ROCm works, the following HIP program should be
> sufficient.  The work is scheduled to the GPU by signaling a semaphore
> using atomic operations from the CPU side, which is completed on the
> GPU, and the GPU-side printf works by signaling a semaphore from the GPU
> that is completed on the CPU.  It can be compiled using hipcc with
> 'hipcc -otest test.hip':
> 
>     #include <hip/hip_runtime.h>
>     __global__ void test() {
>         printf("hello, world\n");
>     }
>     int main() {
>         test<<<dim3(1), dim3(1)>>>();
>         hipDeviceSynchronize();
>     }
> 
> Previously, or when x-atomic-completion is set to false, this program
> would simply hang.  Additionally, a message along the lines of the
> following is printed to dmesg during boot if the GPU driver determines
> that atomics are not supported:
> 
>     amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
> 
> When atomics are properly supported, the above program works as
> intended, and the previous dmesg message is of course not printed. For
> this I am using a simple machine setup using the following device
> options, with the GPU that im testing with of course on 03:00.0.
> 
>      -device pcie-root-port,x-atomic-completion=true,id=pcie.1
>      -device vfio-pci,host=03:00.0,bus=pcie.1
> 
> This patch does not include any automatic detection whether the root
> port of the host supports the atomic completer capabilities, nor if any
> of the physical PCIe bridges between the CPU and GPU support atomic
> routing.  The intention here is that the user should make sure that the
> host does support atomic completion on the root complex. See also some
> prior discussion[4].  I have run the full test suite of some ROCm
> libraries: rocPRIM, rocRAND, hipRAND, hipCUB and rocThrust.  All of the
> tests pass now, with some minor unrelated changes.
> 
> Kind regards,
> 
> Robin Voetter, Stream HPC
> 
> [1] 
> https://github.com/torvalds/linux/blob/v6.2/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L3716
> [2] https://github.com/torvalds/linux/blob/v6.2/drivers/pci/pci.c#L3781
> [3] https://github.com/torvalds/linux/blob/v6.2/drivers/pci/pci.c#L3829
> [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01815.html
> ---
> 
> Robin Voetter (1):
>   pcie: Allow generic PCIE root port to enable atomic completion
> 
>  hw/pci-bridge/gen_pcie_root_port.c | 2 ++
>  hw/pci/pcie.c                      | 6 ++++++
>  include/hw/pci/pcie_port.h         | 3 +++
>  3 files changed, 11 insertions(+)
> 
> --
> 2.39.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]