qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Qemu and 32 PCIe devices


From: Paolo Bonzini
Subject: Re: [Qemu-devel] Qemu and 32 PCIe devices
Date: Wed, 9 Aug 2017 09:26:11 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 09/08/2017 03:06, Laszlo Ersek wrote:
>>   20.14%  qemu-system-x86_64                  [.] render_memory_region
>>   17.14%  qemu-system-x86_64                  [.] subpage_register
>>   10.31%  qemu-system-x86_64                  [.] int128_add
>>    7.86%  qemu-system-x86_64                  [.] addrrange_end
>>    7.30%  qemu-system-x86_64                  [.] int128_ge
>>    4.89%  qemu-system-x86_64                  [.] int128_nz
>>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
>>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc

Yes, this is the O(n^3) thing.  An optimized build should be faster
because int128 operations will be inlined and become much more efficient.

> With this patch, I only tested the "93 devices" case, as the slowdown
> became visible to the naked eye from the trace messages, as the firmware
> enabled more and more BARs / command registers (and inversely, the
> speedup was perceivable when the firmware disabled more and more BARs /
> command registers).

This is an interesting observation, and it's expected.  Looking at the
O(n^3) complexity more in detail you have N operations, where the "i"th
operates on "i" DMA address spaces, all of which have at least "i"
memory regions (at least 1 BAR per device).

So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3).
Expressing it as a sum shows why it gets slower as time progresses.

The solution is to note that those "i" address spaces are actually all
the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2).

Thanks,

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]