[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] Another optimisation for x86-64 fast trace

From: Arun Sharma
Subject: Re: [Libunwind-devel] Another optimisation for x86-64 fast trace
Date: Wed, 30 Mar 2011 11:51:16 -0700

On Wed, Mar 30, 2011 at 8:05 AM, Lassi Tuura <address@hidden> wrote:

> For completeness, perhaps I should mention that I also tested with ".p2align 
> 2" and ".p2align 4" right before ".global _Ux86_64_getcontext_trace". The 
> results started to be slightly sporadic, but curiously all the aligned 
> versions were slightly but systematically slower than the unaligned one (by 
> ~1-2%).
> The function is definitely unaligned with the patch, at offset 0x4e09 into 
> the shared library in my case.

These are usually related to how the x86 decoder works on your CPU. On
Nehalem/Westmere generation it fetches bundles of 16 bytes and decodes
up to 3 simple and one complex uop. There are a lot of interesting
stories about how inserting or removing a nop from a hot loop changes
throughput significantly.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]