qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] How to resolve "Failed to mmap" error?


From: A223 A223
Subject: Re: [Qemu-discuss] How to resolve "Failed to mmap" error?
Date: Mon, 10 Oct 2016 20:05:02 -0700

I just tried testing the patch.

Initially, it seemed to work fine (log messages below).
However, I am getting hard lockup of the host machine shortly after
the vfio_region_setup related log lines print out to the screen. I
would say that rougly 3 of the 9 VM startups resulted in a hard lock
like this. I wasn't getting these hard locks before applying the patch
that I can remember.

I removed the patch and tried to replicate the hard lock and haven't
been able to, though admittedly I was only able to try a few times.
Unfortunately, repeatedly trying to boot the VM to test things starts
to become time consuming, because once the VM has been through a
single startup-shutdown cycle, qemu refuses to start, printing these
errors:
qemu-system-x86_64: -device
vfio-pci,host=05:00.0,bus=root.1,addr=00.5: vfio: Error: Failed to
setup INTx fd: Device or resource busy
qemu-system-x86_64: -device
vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Device initialization
failed

In the host's kernel log, there are a ton of these
[  774.069113] vfio_ecap_init: 0000:05:00.0 hiding ecap address@hidden
Followed by these kernel log errors:
[  774.070330] genirq: Flags mismatch irq 16. 00000000
(vfio-intx(0000:05:00.0)) vs. 00000080 (ehci_hcd:usb1)
[  774.085595] vfio-pci 0000:05:00.0: Refused to change power state,
currently in D3
[  774.797671] vfio-pci 0000:05:00.0: timed out waiting for pending
transaction; performing function level reset anyway
[  775.945685] vfio-pci 0000:05:00.0: Failed to return from FLR

This is not a problem with your patch, but it does complicate my
ability to test the patch repeatedly, since a host restart is needed
between every try. If you have any idea what could be going there,
please do let me know.

Here is the vfio_region log entry output from qemu when started with your patch:
address@hidden:vfio_region_setup Device 0000:02:00.0, region 0
"0000:02:00.0 BAR 0", flags: 7, offset: 0, size: 1000000
address@hidden:vfio_region_setup Device 0000:02:00.0, region 1
"0000:02:00.0 BAR 1", flags: 7, offset: 10000000000, size: 10000000
address@hidden:vfio_region_setup Device 0000:02:00.0, region 2
"0000:02:00.0 BAR 2", flags: 0, offset: 20000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.0, region 3
"0000:02:00.0 BAR 3", flags: 7, offset: 30000000000, size: 2000000
address@hidden:vfio_region_setup Device 0000:02:00.0, region 4
"0000:02:00.0 BAR 4", flags: 0, offset: 40000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.0, region 5
"0000:02:00.0 BAR 5", flags: 3, offset: 50000000000, size: 80
address@hidden:vfio_region_setup Device 0000:02:00.1, region 0
"0000:02:00.1 BAR 0", flags: 7, offset: 0, size: 4000
address@hidden:vfio_region_setup Device 0000:02:00.1, region 1
"0000:02:00.1 BAR 1", flags: 0, offset: 10000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 2
"0000:02:00.1 BAR 2", flags: 0, offset: 20000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 3
"0000:02:00.1 BAR 3", flags: 0, offset: 30000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 4
"0000:02:00.1 BAR 4", flags: 0, offset: 40000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 5
"0000:02:00.1 BAR 5", flags: 0, offset: 50000000000, size: 0
address@hidden:vfio_region_setup Device 0000:05:00.0, region 0
"0000:05:00.0 BAR 0", flags: 7, offset: 0, size: 10000
address@hidden:vfio_region_setup Device 0000:05:00.0, region 1
"0000:05:00.0 BAR 1", flags: 0, offset: 10000000000, size: 0
address@hidden:vfio_region_sparse_mmap_header Device
0000:05:00.0 region 2: 1 sparse mmap entries
address@hidden:vfio_region_sparse_mmap_entry sparse entry 0 [0x0 - 0x0]
address@hidden:vfio_region_setup Device 0000:05:00.0, region 2
"0000:05:00.0 BAR 2", flags: f, offset: 20000000000, size: 1000
address@hidden:vfio_region_setup Device 0000:05:00.0, region 3
"0000:05:00.0 BAR 3", flags: 0, offset: 30000000000, size: 0
address@hidden:vfio_region_setup Device 0000:05:00.0, region 4
"0000:05:00.0 BAR 4", flags: 7, offset: 40000000000, size: 1000
address@hidden:vfio_region_setup Device 0000:05:00.0, region 5
"0000:05:00.0 BAR 5", flags: 0, offset: 50000000000, size: 0




On Fri, Oct 7, 2016 at 9:23 AM, Alex Williamson
<address@hidden> wrote:
> On Thu, 6 Oct 2016 23:19:13 -0700
> A223 A223 <address@hidden> wrote:
>
>> On Thu, Oct 6, 2016 at 9:29 PM, Alex Williamson
>> <address@hidden> wrote:
>> > On Thu, 6 Oct 2016 21:07:06 -0700
>> > A223 A223 <address@hidden> wrote:
>> >
>> >> On Thu, Oct 6, 2016 at 8:30 PM, A223 A223 <address@hidden> wrote:
>> >> > On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
>> >> > <address@hidden> wrote:
>> >> >> On Wed, 5 Oct 2016 20:30:34 -0700
>> >> >> A223 A223 <address@hidden> wrote:
>> >> >>
>> >> >>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>> >> >>> <address@hidden> wrote:
>> >> >>> > On Wed, 5 Oct 2016 02:13:21 -0700
>> >> >>> > A223 A223 <address@hidden> wrote:
>> >> >>> >> How can I go about tracking down the root cause of this error?
>> >> >>> >
>> >> >>> > This often means that the device resources are in use by another,
>> >> >>> > non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
>> >> >>> > drivers are claiming sub-regions on the device.  You can also turn 
>> >> >>> > on
>> >> >>> > tracing and look for the event vfio_region_mmap_fault, which might 
>> >> >>> > give
>> >> >>> > us more information (see docs/tracing.txt).  Also, please provide 
>> >> >>> > 'sudo
>> >> >>> > lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,
>> >> >>>
>> >> >>> The simpletrace output wouldn't parse for me using the python script
>> >> >>> (I'll paste the trace at the end of this email), so I used the ftrace
>> >> >>> backend instead (
>> >> >>> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
>> >> >>> quite large and I'm not sure what I'm looking for, so I didn't look at
>> >> >>> it very carefully.
>> >> >>
>> >> >> That's not the right trace, that's like kvm trace or something.  Try:
>> >> >>
>> >> >> # ./configure --enable-trace-backends=log
>> >> >>
>> >> >> # echo vfio_region_mmap_fault > events.txt
>> >> >>
>> >> >> Then add to your qemu commandline "-trace events=events.txt".  You
>> >> >> should get maybe a couple extra log lines per boot, not gobs of output.
>> >> >
>> >> > Ah, got it. Here are the relevant log lines then:
>> >> >
>> >> > address@hidden:vfio_region_mmap_fault Region 0000:05:00.0 BAR
>> >> > 2 mmaps[0], [20000000000 - 1ffffffffff], fault: -22
>> >> > qemu-system-x86_64: -device
>> >> > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
>> >> > 0000:05:00.0 BAR 2. Performance may be slow
>> >>
>> >> I took a look at everything again and noted this in the lspci output
>> >> for the Fresco Logic USB controller:
>> >> Vector table: BAR=2 offset=00000000
>> >> PBA: BAR=4 offset=00000000
>> >>
>> >> Is it normal for both the Vector table and PBA to have a 0 offset?
>> >
>> > Note that they're in different BARs, so it's a little unexpected but
>> > it's not wrong.  The MSI-X table does appear to be the problem though.
>> > Based on that trace line, the region size we're trying to mmap is
>> > zero.  We're probably hitting a corner case of a page sized BAR
>> > containing the MSI-X table, where we cannot mmap the page covering the
>> > table.  We need to figure out where that's coming from.  Please add the
>> > additional trace lines to your events.txt and try again:
>> >
>> > vfio_region_sparse_mmap_header
>> > vfio_region_sparse_mmap_entry
>> > vfio_region_setup
>> > vfio_msix_fixup
>> >
>> > The bad news is that due to the MSI-X table, we're not going to be able
>> > to mmap that BAR, the good news is that it probably doesn't actually
>> > hurt performance.  With the table and PBA in separate BARs, my guess
>> > would be that's the only thing in those BARs.  Thanks,
>>
>> Ah, overall bad news for me, as I now have absolutely no clue what's
>> causing my USB performance/reliability issues in the VM :(
>>
>> Nonetheless, here is the new log output:
>>
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 0
>> "0000:02:00.0 BAR 0", flags: 7, offset: 0, size: 1000000
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 1
>> "0000:02:00.0 BAR 1", flags: 7, offset: 10000000000, size: 10000000
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 2
>> "0000:02:00.0 BAR 2", flags: 0, offset: 20000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 3
>> "0000:02:00.0 BAR 3", flags: 7, offset: 30000000000, size: 2000000
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 4
>> "0000:02:00.0 BAR 4", flags: 0, offset: 40000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 5
>> "0000:02:00.0 BAR 5", flags: 3, offset: 50000000000, size: 80
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 0
>> "0000:02:00.1 BAR 0", flags: 7, offset: 0, size: 4000
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 1
>> "0000:02:00.1 BAR 1", flags: 0, offset: 10000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 2
>> "0000:02:00.1 BAR 2", flags: 0, offset: 20000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 3
>> "0000:02:00.1 BAR 3", flags: 0, offset: 30000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 4
>> "0000:02:00.1 BAR 4", flags: 0, offset: 40000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 5
>> "0000:02:00.1 BAR 5", flags: 0, offset: 50000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 0
>> "0000:05:00.0 BAR 0", flags: 7, offset: 0, size: 10000
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 1
>> "0000:05:00.0 BAR 1", flags: 0, offset: 10000000000, size: 0
>> address@hidden:vfio_region_sparse_mmap_header Device
>> 0000:05:00.0 region 2: 1 sparse mmap entries
>> address@hidden:vfio_region_sparse_mmap_entry sparse entry 0 [0x0 - 0x0]
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 2
>> "0000:05:00.0 BAR 2", flags: f, offset: 20000000000, size: 1000
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 3
>> "0000:05:00.0 BAR 3", flags: 0, offset: 30000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 4
>> "0000:05:00.0 BAR 4", flags: 7, offset: 40000000000, size: 1000
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 5
>> "0000:05:00.0 BAR 5", flags: 0, offset: 50000000000, size: 0
>> address@hidden:vfio_region_mmap_fault Region 0000:05:00.0 BAR
>> 2 mmaps[0], [20000000000 - 1ffffffffff], fault: -22
>> qemu-system-x86_64: -device
>> vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
>> 0000:05:00.0 BAR 2. Performance may be slow
>> SeaBIOS (version rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org)
>
> Hi Andrew,
>
> Would it be possible for you to test the patch below?  I'd be happy to
> add Reported-by and Tested-by credits to the patch if you can provide
> your real name (and preferred email if other than this one).  Thanks,
>
> Alex
>
> commit 4dcabf99c7d893e14171d01f1214b4ec0977fc28
> Author: Alex Williamson <address@hidden>
> Date:   Fri Oct 7 10:14:22 2016 -0600
>
>     vfio: Handle zero-length sparse mmap ranges
>
>     As reported in the link below, user has a PCI device with a 4KB BAR
>     which contains the MSI-X table.  This seems to hit a corner case in
>     the kernel where the region reports being mmap capable, but the sparse
>     mmap information reports a zero sized range.  It's not entirely clear
>     that the kernel is incorrect in doing this, but regardless, we need
>     to handle it.  To do this, fill our mmap array only with non-zero
>     sized sparse mmap entries and add an error return from the function
>     so we can tell the difference between nr_mmaps being zero based on
>     sparse mmap info vs lack of sparse mmap info.
>
>     NB, this doesn't actually change the behavior of the device, it only
>     removes the scary "Failed to mmap ... Performance may be slow" error
>     message.  We cannot currently create an mmap over the MSI-X table.
>
>     Link: 
> http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.html
>     Signed-off-by: Alex Williamson <address@hidden>
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9505fb3..ba8d451 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -610,16 +610,16 @@ vfio_get_region_info_cap(struct vfio_region_info *info, 
> uint16_t id)
>      return NULL;
>  }
>
> -static void vfio_setup_region_sparse_mmaps(VFIORegion *region,
> -                                           struct vfio_region_info *info)
> +static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
> +                                          struct vfio_region_info *info)
>  {
>      struct vfio_info_cap_header *hdr;
>      struct vfio_region_info_cap_sparse_mmap *sparse;
> -    int i;
> +    int i, j;
>
>      hdr = vfio_get_region_info_cap(info, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
>      if (!hdr) {
> -        return;
> +        return -ENODEV;
>      }
>
>      sparse = container_of(hdr, struct vfio_region_info_cap_sparse_mmap, 
> header);
> @@ -627,16 +627,24 @@ static void vfio_setup_region_sparse_mmaps(VFIORegion 
> *region,
>      trace_vfio_region_sparse_mmap_header(region->vbasedev->name,
>                                           region->nr, sparse->nr_areas);
>
> -    region->nr_mmaps = sparse->nr_areas;
> -    region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
> +    region->mmaps = g_new0(VFIOMmap, sparse->nr_areas);
>
> -    for (i = 0; i < region->nr_mmaps; i++) {
> -        region->mmaps[i].offset = sparse->areas[i].offset;
> -        region->mmaps[i].size = sparse->areas[i].size;
> -        trace_vfio_region_sparse_mmap_entry(i, region->mmaps[i].offset,
> -                                            region->mmaps[i].offset +
> -                                            region->mmaps[i].size);
> +    for (i = 0, j = 0; i < sparse->nr_areas; i++) {
> +        trace_vfio_region_sparse_mmap_entry(i, sparse->areas[i].offset,
> +                                            sparse->areas[i].offset +
> +                                            sparse->areas[i].size);
> +
> +       if (sparse->areas[i].size) {
> +            region->mmaps[j].offset = sparse->areas[i].offset;
> +            region->mmaps[j].size = sparse->areas[i].size;
> +            j++;
> +        }
>      }
> +
> +    region->nr_mmaps = j;
> +    region->mmaps = g_realloc(region->mmaps, j * sizeof(VFIOMmap));
> +
> +    return 0;
>  }
>
>  int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
> @@ -665,9 +673,9 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
> VFIORegion *region,
>              region->flags & VFIO_REGION_INFO_FLAG_MMAP &&
>              !(region->size & ~qemu_real_host_page_mask)) {
>
> -            vfio_setup_region_sparse_mmaps(region, info);
> +            ret = vfio_setup_region_sparse_mmaps(region, info);
>
> -            if (!region->nr_mmaps) {
> +            if (ret) {
>                  region->nr_mmaps = 1;
>                  region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
>                  region->mmaps[0].offset = 0;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]