qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /


From: Vijay Kilari
Subject: Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
Date: Fri, 8 Apr 2016 11:51:29 +0530

Hi Peter,

On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <address@hidden> wrote:
> On 7 April 2016 at 11:56, Vijay Kilari <address@hidden> wrote:
>> On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <address@hidden> wrote:
>>> On 7 April 2016 at 10:58,  <address@hidden> wrote:
>>>> From: Vijaya Kumar K <address@hidden>
>>>>
>>>> utils cannot read target cpu information to
>>>> fetch cpu information to implement cpu specific
>>>> features or erratas. For this parse /proc/cpuinfo
>>>> and fetch cpu information.
>>>>
>>>> For now this helper only fetches cpu information
>>>> for arm architectures.
>>>
>>> As I understand it /proc/cpuinfo is intended only for
>>> humans to read. Please don't write code to parse it;
>>> find a different way to get this information instead
>>> if you really need it.
>
>> Also unlike x86 there is no cpuid.h where we can get cpu identification
>> information for arm64.
>
> I'm told there are kernel patches in progress to get this sort
> of information in a maintainable way to userspace, which are
> currently somewhat stalled due to lack of anybody who wants to
> consume it. If you have a use case then you should probably
> flag it up with the kernel devs.

Can you please give references to those patches/discussion?

>
> That said, I think we should probably hold off on this
> discussion until we have clearer benchmarking info that
> demonstrates that doing these prefetches really does make
> a significant difference. I would much prefer to have a


Thunderx pass2 board does not have hardware prefetch. So
explicit sw prefetch instructions is required for this platform.
Here is the benchmarking result with and without prefetch.
of an idle VM with 4 VCPUS, 8GB RAM.

Without prefech, total migration time is 8.2 seconds
With prefetch total migration time is 2.7 seconds.

Without prefetch:
------------------------

(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 8217 milliseconds
downtime: 86 milliseconds
setup: 4 milliseconds
transferred ram: 212624 kbytes
throughput: 212.08 mbps
remaining ram: 0 kbytes
total ram: 8520128 kbytes
duplicate: 2085805 pages
skipped: 0 pages
normal: 48478 pages
normal bytes: 193912 kbytes
dirty sync count: 3

With prefetch:
--------------------
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 2744 milliseconds
downtime: 48 milliseconds
setup: 5 milliseconds
transferred ram: 213526 kbytes
throughput: 637.76 mbps
remaining ram: 0 kbytes
total ram: 8520128 kbytes
duplicate: 2085014 pages
skipped: 0 pages
normal: 48705 pages
normal bytes: 194820 kbytes
dirty sync count: 3

> single aarch64 routine that works for everybody, rather
> than a thunderx-only special case.

Now, I found that the generic existings function by name
buffer_find_nonzero_offset_inner()
 can be made to work with neon. So no need of special function by name
buffer_find_nonzero_offset_neon() for arm64 creating in this patch series.
However, adding prefetch code needs to be added for performance
reason.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]