qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DMAR fault with qemu 7.2 and Ubuntu 22.04 base image


From: Peter Xu
Subject: Re: DMAR fault with qemu 7.2 and Ubuntu 22.04 base image
Date: Mon, 13 Feb 2023 17:21:45 -0500

On Mon, Feb 13, 2023 at 10:15:34PM +0530, Major Saheb wrote:
> Hi All,

Hi, Major,

> 
> I am facing an issue with qemu 7.2 rc2 with nvme. I have a container

Is there any known working qemu version?  Or should I assume it's failing
always?

> running Ubuntu 22.04 base image and host is running Ubuntu 22.04
> (Linux node-1 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC
> 2023 x86_64 x86_64 x86_64 GNU/Linux), and I am using vfio-pci to
> communicate with nvme devices. I am observing DMAR fault as following
> 
> [ 3761.999996] DMAR: DRHD: handling fault status reg 2
> [ 3762.001609] DMAR: [DMA Read NO_PASID] Request device [0b:00.0]
> fault addr 0x1187a9000 [fault reason 0x06] PTE Read access is not set
> 
> I also enabled vtd_iommu_translate and vtd_dmar_fault traces which is
> showing the following
> 
> 2023-02-13T07:02:37.074397Z qemu-system-x86_64: vtd_iova_to_slpte:
> detected slpte permission error (iova=0x1187a9000, level=0x3,
> slpte=0x0, write=0, pasid=0xffffffff)

I think slpte=0x0 means the device pgtable entry does not exist at all,
rather than an explicit permission issue.

Is the guest using generic Intel IOMMU driver?  Could it possible that for
some reason the pgtable update was not flushed to the guest pages when the
driver sent the IOTLB invalidation (probably via QI interface)?

I saw that you mentioned your driver is using 0x800000000 as iova base
address, then why here the iova being mapped is 0x1187a9000?  Anything
special on the device driver being used?

> 236498@1676271757.075075:vtd_dmar_fault sid 0xb00 fault 6 addr
> 0x1187a9000 write 0
> 2023-02-13T07:02:37.075174Z qemu-system-x86_64: vtd_iommu_translate:
> detected translation failure (dev=0b:00:00, iova=0x1187a9000)
> 
> It seems 'vtd_iova_to_slpte()' it is returning 0 slpte resulting this
> issue. In our driver code (which is running in a container) we are
> using 0x800000000 as our IOVA base address. We have 10 nvme devices
> that we are initializing, and we start initialization by sending
> identify controller and get log page commands. Observation is
> sometimes the first device is getting DMAR fault, sometimes first is
> successfully completed identify and get log page, and second device is
> getting DMAR fault. Also if I use Ubuntu 20.04 as base image for the
> container, then this issue is not seens as seen in the following trace
> output
> 
> 278365@1676297375.587477:vtd_dmar_slpte sid 0xb00 slpte 0x1f2556003
> addr 0x800000000 write 0
> 278365@1676297375.587513:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x800000000 -> gpa 0x1f2556000 mask 0xfff
> 278365@1676297375.587527:vtd_dmar_slpte sid 0xb00 slpte 0x1f25fc003
> addr 0x80020a000 write 1
> 278365@1676297375.587532:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x80020a000 -> gpa 0x1f25fc000 mask 0xfff
> 278365@1676297375.587566:vtd_dmar_slpte sid 0xb00 slpte 0x1f2560003
> addr 0x800008000 write 1
> 278365@1676297375.587572:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x800008000 -> gpa 0x1f2560000 mask 0xfff
> 278365@1676297375.587814:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x800000000 -> gpa 0x1f2556000 mask 0xfff
> 278365@1676297375.587850:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x800008000 -> gpa 0x1f2560000 mask 0xfff
> 278365@1676297375.588455:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x800000000 -> gpa 0x1f2556000 mask 0xfff
> 278365@1676297375.588473:vtd_dmar_slpte sid 0xb00 slpte 0x1f25fe003
> addr 0x80020b000 write 1
> 278365@1676297375.588479:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x80020b000 -> gpa 0x1f25fe000 mask 0xfff
> 278365@1676297375.588507:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x800008000 -> gpa 0x1f2560000 mask 0xfff
> 278365@1676297375.588737:vtd_dmar_translate pasid 0xffffffff: dev
> 0b:00.00 iova 0x800000000 -> gpa 0x1f2556000 mask 0xfff
> 
> Following is the partial qemu command line that I am using
> 
> -device 
> intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on,aw-bits=48
> 
> -device pcie-root-port,id=pcie-root-port0,slot=1 -drive
> file=/home/raghu/lfd/datadir/rs-bdc0/storage/rs-bdc0-0-0,format=qcow2,cache=unsafe,if=none,id=NVME0
> -device nvme,serial=rs-bdc0-0_0,id=NVME0,bus=pcie-root-port0 -device
> nvme-ns,drive=NVME0,eui64=0,uuid=30303030-3030-3030-3030-303030303030
> 
> cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic
> root=UUID=102c974c-7a1c-49cf-9bdd-a8e4638cf5c4 ro console=ttyS0,115200
> intel_iommu=on iommu=pt
> 
> I have also tried without 'iommu=pt' producing same result in Ubuntu
> 22.04 base image in container, also the host OS version remains same
> in both cases.

Did you mean using iommu=pt on the host or guest, or both?

IIUC iommu=pt on the host at least won't make a difference, because when
it's assigned to QEMU it'll ignore the pt setting.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]