[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-discuss] How to resolve "Failed to mmap" error?
From: |
A223 A223 |
Subject: |
Re: [Qemu-discuss] How to resolve "Failed to mmap" error? |
Date: |
Mon, 10 Oct 2016 20:05:02 -0700 |
I just tried testing the patch.
Initially, it seemed to work fine (log messages below).
However, I am getting hard lockup of the host machine shortly after
the vfio_region_setup related log lines print out to the screen. I
would say that rougly 3 of the 9 VM startups resulted in a hard lock
like this. I wasn't getting these hard locks before applying the patch
that I can remember.
I removed the patch and tried to replicate the hard lock and haven't
been able to, though admittedly I was only able to try a few times.
Unfortunately, repeatedly trying to boot the VM to test things starts
to become time consuming, because once the VM has been through a
single startup-shutdown cycle, qemu refuses to start, printing these
errors:
qemu-system-x86_64: -device
vfio-pci,host=05:00.0,bus=root.1,addr=00.5: vfio: Error: Failed to
setup INTx fd: Device or resource busy
qemu-system-x86_64: -device
vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Device initialization
failed
In the host's kernel log, there are a ton of these
[ 774.069113] vfio_ecap_init: 0000:05:00.0 hiding ecap address@hidden
Followed by these kernel log errors:
[ 774.070330] genirq: Flags mismatch irq 16. 00000000
(vfio-intx(0000:05:00.0)) vs. 00000080 (ehci_hcd:usb1)
[ 774.085595] vfio-pci 0000:05:00.0: Refused to change power state,
currently in D3
[ 774.797671] vfio-pci 0000:05:00.0: timed out waiting for pending
transaction; performing function level reset anyway
[ 775.945685] vfio-pci 0000:05:00.0: Failed to return from FLR
This is not a problem with your patch, but it does complicate my
ability to test the patch repeatedly, since a host restart is needed
between every try. If you have any idea what could be going there,
please do let me know.
Here is the vfio_region log entry output from qemu when started with your patch:
address@hidden:vfio_region_setup Device 0000:02:00.0, region 0
"0000:02:00.0 BAR 0", flags: 7, offset: 0, size: 1000000
address@hidden:vfio_region_setup Device 0000:02:00.0, region 1
"0000:02:00.0 BAR 1", flags: 7, offset: 10000000000, size: 10000000
address@hidden:vfio_region_setup Device 0000:02:00.0, region 2
"0000:02:00.0 BAR 2", flags: 0, offset: 20000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.0, region 3
"0000:02:00.0 BAR 3", flags: 7, offset: 30000000000, size: 2000000
address@hidden:vfio_region_setup Device 0000:02:00.0, region 4
"0000:02:00.0 BAR 4", flags: 0, offset: 40000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.0, region 5
"0000:02:00.0 BAR 5", flags: 3, offset: 50000000000, size: 80
address@hidden:vfio_region_setup Device 0000:02:00.1, region 0
"0000:02:00.1 BAR 0", flags: 7, offset: 0, size: 4000
address@hidden:vfio_region_setup Device 0000:02:00.1, region 1
"0000:02:00.1 BAR 1", flags: 0, offset: 10000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 2
"0000:02:00.1 BAR 2", flags: 0, offset: 20000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 3
"0000:02:00.1 BAR 3", flags: 0, offset: 30000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 4
"0000:02:00.1 BAR 4", flags: 0, offset: 40000000000, size: 0
address@hidden:vfio_region_setup Device 0000:02:00.1, region 5
"0000:02:00.1 BAR 5", flags: 0, offset: 50000000000, size: 0
address@hidden:vfio_region_setup Device 0000:05:00.0, region 0
"0000:05:00.0 BAR 0", flags: 7, offset: 0, size: 10000
address@hidden:vfio_region_setup Device 0000:05:00.0, region 1
"0000:05:00.0 BAR 1", flags: 0, offset: 10000000000, size: 0
address@hidden:vfio_region_sparse_mmap_header Device
0000:05:00.0 region 2: 1 sparse mmap entries
address@hidden:vfio_region_sparse_mmap_entry sparse entry 0 [0x0 - 0x0]
address@hidden:vfio_region_setup Device 0000:05:00.0, region 2
"0000:05:00.0 BAR 2", flags: f, offset: 20000000000, size: 1000
address@hidden:vfio_region_setup Device 0000:05:00.0, region 3
"0000:05:00.0 BAR 3", flags: 0, offset: 30000000000, size: 0
address@hidden:vfio_region_setup Device 0000:05:00.0, region 4
"0000:05:00.0 BAR 4", flags: 7, offset: 40000000000, size: 1000
address@hidden:vfio_region_setup Device 0000:05:00.0, region 5
"0000:05:00.0 BAR 5", flags: 0, offset: 50000000000, size: 0
On Fri, Oct 7, 2016 at 9:23 AM, Alex Williamson
<address@hidden> wrote:
> On Thu, 6 Oct 2016 23:19:13 -0700
> A223 A223 <address@hidden> wrote:
>
>> On Thu, Oct 6, 2016 at 9:29 PM, Alex Williamson
>> <address@hidden> wrote:
>> > On Thu, 6 Oct 2016 21:07:06 -0700
>> > A223 A223 <address@hidden> wrote:
>> >
>> >> On Thu, Oct 6, 2016 at 8:30 PM, A223 A223 <address@hidden> wrote:
>> >> > On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
>> >> > <address@hidden> wrote:
>> >> >> On Wed, 5 Oct 2016 20:30:34 -0700
>> >> >> A223 A223 <address@hidden> wrote:
>> >> >>
>> >> >>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>> >> >>> <address@hidden> wrote:
>> >> >>> > On Wed, 5 Oct 2016 02:13:21 -0700
>> >> >>> > A223 A223 <address@hidden> wrote:
>> >> >>> >> How can I go about tracking down the root cause of this error?
>> >> >>> >
>> >> >>> > This often means that the device resources are in use by another,
>> >> >>> > non-pci driver. Look in /proc/iomem and /proc/ioport to see if any
>> >> >>> > drivers are claiming sub-regions on the device. You can also turn
>> >> >>> > on
>> >> >>> > tracing and look for the event vfio_region_mmap_fault, which might
>> >> >>> > give
>> >> >>> > us more information (see docs/tracing.txt). Also, please provide
>> >> >>> > 'sudo
>> >> >>> > lspci -vvvs 5:00.0' so we can see the device resources. Thanks,
>> >> >>>
>> >> >>> The simpletrace output wouldn't parse for me using the python script
>> >> >>> (I'll paste the trace at the end of this email), so I used the ftrace
>> >> >>> backend instead (
>> >> >>> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
>> >> >>> quite large and I'm not sure what I'm looking for, so I didn't look at
>> >> >>> it very carefully.
>> >> >>
>> >> >> That's not the right trace, that's like kvm trace or something. Try:
>> >> >>
>> >> >> # ./configure --enable-trace-backends=log
>> >> >>
>> >> >> # echo vfio_region_mmap_fault > events.txt
>> >> >>
>> >> >> Then add to your qemu commandline "-trace events=events.txt". You
>> >> >> should get maybe a couple extra log lines per boot, not gobs of output.
>> >> >
>> >> > Ah, got it. Here are the relevant log lines then:
>> >> >
>> >> > address@hidden:vfio_region_mmap_fault Region 0000:05:00.0 BAR
>> >> > 2 mmaps[0], [20000000000 - 1ffffffffff], fault: -22
>> >> > qemu-system-x86_64: -device
>> >> > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
>> >> > 0000:05:00.0 BAR 2. Performance may be slow
>> >>
>> >> I took a look at everything again and noted this in the lspci output
>> >> for the Fresco Logic USB controller:
>> >> Vector table: BAR=2 offset=00000000
>> >> PBA: BAR=4 offset=00000000
>> >>
>> >> Is it normal for both the Vector table and PBA to have a 0 offset?
>> >
>> > Note that they're in different BARs, so it's a little unexpected but
>> > it's not wrong. The MSI-X table does appear to be the problem though.
>> > Based on that trace line, the region size we're trying to mmap is
>> > zero. We're probably hitting a corner case of a page sized BAR
>> > containing the MSI-X table, where we cannot mmap the page covering the
>> > table. We need to figure out where that's coming from. Please add the
>> > additional trace lines to your events.txt and try again:
>> >
>> > vfio_region_sparse_mmap_header
>> > vfio_region_sparse_mmap_entry
>> > vfio_region_setup
>> > vfio_msix_fixup
>> >
>> > The bad news is that due to the MSI-X table, we're not going to be able
>> > to mmap that BAR, the good news is that it probably doesn't actually
>> > hurt performance. With the table and PBA in separate BARs, my guess
>> > would be that's the only thing in those BARs. Thanks,
>>
>> Ah, overall bad news for me, as I now have absolutely no clue what's
>> causing my USB performance/reliability issues in the VM :(
>>
>> Nonetheless, here is the new log output:
>>
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 0
>> "0000:02:00.0 BAR 0", flags: 7, offset: 0, size: 1000000
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 1
>> "0000:02:00.0 BAR 1", flags: 7, offset: 10000000000, size: 10000000
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 2
>> "0000:02:00.0 BAR 2", flags: 0, offset: 20000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 3
>> "0000:02:00.0 BAR 3", flags: 7, offset: 30000000000, size: 2000000
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 4
>> "0000:02:00.0 BAR 4", flags: 0, offset: 40000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.0, region 5
>> "0000:02:00.0 BAR 5", flags: 3, offset: 50000000000, size: 80
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 0
>> "0000:02:00.1 BAR 0", flags: 7, offset: 0, size: 4000
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 1
>> "0000:02:00.1 BAR 1", flags: 0, offset: 10000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 2
>> "0000:02:00.1 BAR 2", flags: 0, offset: 20000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 3
>> "0000:02:00.1 BAR 3", flags: 0, offset: 30000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 4
>> "0000:02:00.1 BAR 4", flags: 0, offset: 40000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:02:00.1, region 5
>> "0000:02:00.1 BAR 5", flags: 0, offset: 50000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 0
>> "0000:05:00.0 BAR 0", flags: 7, offset: 0, size: 10000
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 1
>> "0000:05:00.0 BAR 1", flags: 0, offset: 10000000000, size: 0
>> address@hidden:vfio_region_sparse_mmap_header Device
>> 0000:05:00.0 region 2: 1 sparse mmap entries
>> address@hidden:vfio_region_sparse_mmap_entry sparse entry 0 [0x0 - 0x0]
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 2
>> "0000:05:00.0 BAR 2", flags: f, offset: 20000000000, size: 1000
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 3
>> "0000:05:00.0 BAR 3", flags: 0, offset: 30000000000, size: 0
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 4
>> "0000:05:00.0 BAR 4", flags: 7, offset: 40000000000, size: 1000
>> address@hidden:vfio_region_setup Device 0000:05:00.0, region 5
>> "0000:05:00.0 BAR 5", flags: 0, offset: 50000000000, size: 0
>> address@hidden:vfio_region_mmap_fault Region 0000:05:00.0 BAR
>> 2 mmaps[0], [20000000000 - 1ffffffffff], fault: -22
>> qemu-system-x86_64: -device
>> vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
>> 0000:05:00.0 BAR 2. Performance may be slow
>> SeaBIOS (version rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org)
>
> Hi Andrew,
>
> Would it be possible for you to test the patch below? I'd be happy to
> add Reported-by and Tested-by credits to the patch if you can provide
> your real name (and preferred email if other than this one). Thanks,
>
> Alex
>
> commit 4dcabf99c7d893e14171d01f1214b4ec0977fc28
> Author: Alex Williamson <address@hidden>
> Date: Fri Oct 7 10:14:22 2016 -0600
>
> vfio: Handle zero-length sparse mmap ranges
>
> As reported in the link below, user has a PCI device with a 4KB BAR
> which contains the MSI-X table. This seems to hit a corner case in
> the kernel where the region reports being mmap capable, but the sparse
> mmap information reports a zero sized range. It's not entirely clear
> that the kernel is incorrect in doing this, but regardless, we need
> to handle it. To do this, fill our mmap array only with non-zero
> sized sparse mmap entries and add an error return from the function
> so we can tell the difference between nr_mmaps being zero based on
> sparse mmap info vs lack of sparse mmap info.
>
> NB, this doesn't actually change the behavior of the device, it only
> removes the scary "Failed to mmap ... Performance may be slow" error
> message. We cannot currently create an mmap over the MSI-X table.
>
> Link:
> http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.html
> Signed-off-by: Alex Williamson <address@hidden>
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9505fb3..ba8d451 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -610,16 +610,16 @@ vfio_get_region_info_cap(struct vfio_region_info *info,
> uint16_t id)
> return NULL;
> }
>
> -static void vfio_setup_region_sparse_mmaps(VFIORegion *region,
> - struct vfio_region_info *info)
> +static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
> + struct vfio_region_info *info)
> {
> struct vfio_info_cap_header *hdr;
> struct vfio_region_info_cap_sparse_mmap *sparse;
> - int i;
> + int i, j;
>
> hdr = vfio_get_region_info_cap(info, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
> if (!hdr) {
> - return;
> + return -ENODEV;
> }
>
> sparse = container_of(hdr, struct vfio_region_info_cap_sparse_mmap,
> header);
> @@ -627,16 +627,24 @@ static void vfio_setup_region_sparse_mmaps(VFIORegion
> *region,
> trace_vfio_region_sparse_mmap_header(region->vbasedev->name,
> region->nr, sparse->nr_areas);
>
> - region->nr_mmaps = sparse->nr_areas;
> - region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
> + region->mmaps = g_new0(VFIOMmap, sparse->nr_areas);
>
> - for (i = 0; i < region->nr_mmaps; i++) {
> - region->mmaps[i].offset = sparse->areas[i].offset;
> - region->mmaps[i].size = sparse->areas[i].size;
> - trace_vfio_region_sparse_mmap_entry(i, region->mmaps[i].offset,
> - region->mmaps[i].offset +
> - region->mmaps[i].size);
> + for (i = 0, j = 0; i < sparse->nr_areas; i++) {
> + trace_vfio_region_sparse_mmap_entry(i, sparse->areas[i].offset,
> + sparse->areas[i].offset +
> + sparse->areas[i].size);
> +
> + if (sparse->areas[i].size) {
> + region->mmaps[j].offset = sparse->areas[i].offset;
> + region->mmaps[j].size = sparse->areas[i].size;
> + j++;
> + }
> }
> +
> + region->nr_mmaps = j;
> + region->mmaps = g_realloc(region->mmaps, j * sizeof(VFIOMmap));
> +
> + return 0;
> }
>
> int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
> @@ -665,9 +673,9 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev,
> VFIORegion *region,
> region->flags & VFIO_REGION_INFO_FLAG_MMAP &&
> !(region->size & ~qemu_real_host_page_mask)) {
>
> - vfio_setup_region_sparse_mmaps(region, info);
> + ret = vfio_setup_region_sparse_mmaps(region, info);
>
> - if (!region->nr_mmaps) {
> + if (ret) {
> region->nr_mmaps = 1;
> region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
> region->mmaps[0].offset = 0;
- [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/05
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Alex Williamson, 2016/10/05
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/06
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/06
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Alex Williamson, 2016/10/06
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/06
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/07
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Alex Williamson, 2016/10/07
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/07
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Alex Williamson, 2016/10/07
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?,
A223 A223 <=
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Alex Williamson, 2016/10/14
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Alex Williamson, 2016/10/14
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Alex Williamson, 2016/10/18
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/19
- [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/19
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/20
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, A223 A223, 2016/10/20
- Re: [Qemu-discuss] How to resolve "Failed to mmap" error?, Jakob Bohm, 2016/10/22