[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port
From: |
Michael S. Tsirkin |
Subject: |
Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port |
Date: |
Fri, 21 Apr 2023 04:22:23 -0400 |
On Thu, Apr 20, 2023 at 05:38:39PM +0200, robin@streamhpc.com wrote:
> From: Robin Voetter <robin@streamhpc.com>
>
> The ROCm driver for Linux uses PCIe atomics to schedule work and
> generally communicate between the host and the device. This does not
> currently work in QEMU with regular vfio-pci passthrough, because the
> pcie-root-port does not advertise the PCIe atomic completer
> capabilities. When initializing the GPU from the Linux driver, it
> queries whether the PCIe connection from the CPU to GPU supports the
> required capabilities[1] in the pci_enable_atomic_ops_to_root
> function[2]. Currently the only part where this fails is checking the
> atomic completer capabilities (32 and 64 bits) on the root port[3]. In
> this case, the driver determines that PCIe atomics are not supported at
> all, and this causes ROCm programs to misbehave. (While AMD advertises
> that there is some support for ROCm without PCIe atomics, I have never
> actually gotten that working...)
>
> This patch allows ROCm to properly function by introducing an
> additional experimental property to the pcie-root-port,
> x-atomic-completion.
so what exactly makes it experimental? from this description
it looks like it actually has to be enabled for things to work?
Also pls CC alex on whether this is a correct way to do it.
> Setting this option makes the port report
> support for the PCI_EXP_DEVCAP2_ATOMIC_COMP32 and COMP64
> capabilities. This then makes the check from [3] pass, and
> everything seems to work appropriately after that.
>
> To verify that the capabilities are reported correctly, one can use
> lspci to check the capabilities of the root port: lspci -vvv -s <root
> port id> should show 32bit+ and 64bit+ capabilities in DevCap2 when
> x-atomic-completion is enabled. For example:
>
> -device pcie-root-port,x-atomic-completion=true,id.pcie.1
>
> The output of lspci should include the following for the pcie root port:
>
> AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
>
> To verify that ROCm works, the following HIP program should be
> sufficient. The work is scheduled to the GPU by signaling a semaphore
> using atomic operations from the CPU side, which is completed on the
> GPU, and the GPU-side printf works by signaling a semaphore from the GPU
> that is completed on the CPU. It can be compiled using hipcc with
> 'hipcc -otest test.hip':
>
> #include <hip/hip_runtime.h>
> __global__ void test() {
> printf("hello, world\n");
> }
> int main() {
> test<<<dim3(1), dim3(1)>>>();
> hipDeviceSynchronize();
> }
>
> Previously, or when x-atomic-completion is set to false, this program
> would simply hang. Additionally, a message along the lines of the
> following is printed to dmesg during boot if the GPU driver determines
> that atomics are not supported:
>
> amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>
> When atomics are properly supported, the above program works as
> intended, and the previous dmesg message is of course not printed. For
> this I am using a simple machine setup using the following device
> options, with the GPU that im testing with of course on 03:00.0.
>
> -device pcie-root-port,x-atomic-completion=true,id=pcie.1
> -device vfio-pci,host=03:00.0,bus=pcie.1
>
> This patch does not include any automatic detection whether the root
> port of the host supports the atomic completer capabilities, nor if any
> of the physical PCIe bridges between the CPU and GPU support atomic
> routing. The intention here is that the user should make sure that the
> host does support atomic completion on the root complex. See also some
> prior discussion[4]. I have run the full test suite of some ROCm
> libraries: rocPRIM, rocRAND, hipRAND, hipCUB and rocThrust. All of the
> tests pass now, with some minor unrelated changes.
>
> Kind regards,
>
> Robin Voetter, Stream HPC
>
> [1]
> https://github.com/torvalds/linux/blob/v6.2/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L3716
> [2] https://github.com/torvalds/linux/blob/v6.2/drivers/pci/pci.c#L3781
> [3] https://github.com/torvalds/linux/blob/v6.2/drivers/pci/pci.c#L3829
> [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01815.html
> ---
>
> Robin Voetter (1):
> pcie: Allow generic PCIE root port to enable atomic completion
>
> hw/pci-bridge/gen_pcie_root_port.c | 2 ++
> hw/pci/pcie.c | 6 ++++++
> include/hw/pci/pcie_port.h | 3 +++
> 3 files changed, 11 insertions(+)
>
> --
> 2.39.2