Re: [Qemu-arm] [RFC PATCH v2 1/2] utils: Add helper to read arm MIDR

qemu-arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [RFC PATCH v2 1/2] utils: Add helper to read arm MIDR_EL1

From:	Richard Henderson
Subject:	Re: [Qemu-arm] [RFC PATCH v2 1/2] utils: Add helper to read arm MIDR_EL1 register
Date:	Fri, 19 Aug 2016 07:57:23 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 08/19/2016 02:05 AM, Vijay Kilari wrote:

On Thu, Aug 18, 2016 at 8:26 PM, Peter Maydell <address@hidden> wrote:

On 18 August 2016 at 15:46, Richard Henderson <address@hidden> wrote:

On 08/18/2016 07:14 AM, Peter Maydell wrote:

While we're on the subject, can somebody explain to me why we
use ifuncs at all? I couldn't work out why it would be better than
just using a straightforward function pointer -- when I tried single
stepping through things the ifunc approach still seemed to indirect
through some table or other so it wasn't actually resolving to
a direct function call anyway.

No reason, I suppose.

It's particularly helpful for libraries, where we don't really want the
overhead of the initialization when it's not used.


Ah, I see.

But (1) we don't have many of these and (2) we really don't care *that* much
about startup time.

So a simple function pointer initialized by a constructor has the same
effect.


 The cutils does not have any initialization function that can init
function/constructor pointer
for zero_check function.


static void __attribute__((constructor)) init_buffer_find_nonzero(void)
{
   ...
}

Also creating separate function with most of repeated code for prefetch does
not look good.


Why do you say that?

So suggest to put check for prefetch outside the for loop and
code for loop with and without prefetch

You're duplicating the inner loop either way, so that can't be your objectionto creating a separate function.

I profiled and found that a single check inside the loop is adding 100ms delay
for 8GB RAM migration.


That's about what I expected.

Also,  If you want to make prefetch common for all arm64 platforms,
Then thunder cache line is 128 bytes so the prefetch is performed
at 128 byte index. If the platform has 64 byte cache line, then this
prefetch will fill only 64 byte line instead of 128 bytes required for the loop.


Yes, I had thought of that.

It would make sense to create two versions, that prefetch for and iterate over,cacheline sizes of 64 and 128 (I don't know of any other common sizes).

Preferably, we should then use sysconf(_SC_LEVEL1_DCACHE_LINESIZE) within theinit function above to choose the appropriate version.

But I see that glibc doesn't currently implement that for aarch64, so we dowant to have a fallback. I know that the "official" cache line data isn't(easily) available to userspace, but a close proxy is the size described bydczid_el0. That seems much better than groveling through a file under /sys.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-arm] [RFC PATCH v2 1/2] utils: Add helper to read arm MIDR_EL1 register, (continued)

Prev by Date: Re: [Qemu-arm] [PATCH] block: m25p80c Fix vmstate structure name
Next by Date: Re: [Qemu-arm] [Qemu-devel] Help: Does Qemu support virtio-pci for net-device and disk device?
Previous by thread: Re: [Qemu-arm] [RFC PATCH v2 1/2] utils: Add helper to read arm MIDR_EL1 register
Next by thread: [Qemu-arm] Help: Does Qemu support virtio-pci for net-device and disk device?
Index(es):
- Date
- Thread