qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] profiling qemu


From: Laurent Desnogues
Subject: Re: [Qemu-devel] profiling qemu
Date: Tue, 14 Feb 2012 16:30:18 +0100

On Tue, Feb 14, 2012 at 4:15 PM, Artyom Tarasenko <address@hidden> wrote:
> 2012/2/14 Laurent Desnogues <address@hidden>:
>> 2012/2/14 Lluís Vilanova <address@hidden>:
>>> Artyom Tarasenko writes:
>> [...]
>>>> Here it looks like "compute_all_sub" and "compute_all_sub_xcc" are
>>>> good candidates for optimizing: together they take the same amount of
>>>> time as cpu_sparc_exec. I guess both operations would be trivial in
>>>> the x86_64 assembler. What would be the best strategy to make TCG take
>>>> the advantage of running on a x86_64 host?
>>>
>>> A quick look into the code reveals that these two are called from a TCG 
>>> helper
>>> (helper_compute_psr), so I see two approaches here applicable to the most
>>> frequently used "sub-operations" in helper_compute_psr:
>>>
>>> * Define new simpler helpers for those sub-operations that can be declared 
>>> with
>>>  TCG_CALL_CONST and generate the new psr/xcc values in temporal registers. 
>>> You
>>>  must make sure any other code will still be able to use the new psr/xcc
>>>  values.
>>>
>>> * Reimplement these sub-operations in pure TCG code.
>>>
>>>
>>> But first, make sure you run a proper benchmark to establish where are the
>>> hotspots in the sparc code for QEMU. The problem here is to establish what a
>>> proper benchmark is :)
>>
>> Similar helpers are used in ARM translation, so I'm not surprised
>> they show up (typically sub/flag instructions are used for loops).
>>
>> A good strategy is indeed to generate TCG code and let the
>> NZ/C/etc. be global temps as other CPU registers.  This gains a
>> few percents of speed.
>
> Can you give an example, where global temp would be faster than an
> inline helper? At the first sight it's trading a cheap math operation
> (in case of sub, a few cheap math operations in case of subx) against
> a memory access. Or do you mean, use the global flag registers instead
> of CC_SRC{1,2} and always compute them?

I mean that for my work on ARM I just declared the NZ/C/V fields
as other registers (tcg_global_mem_new) and let the TCG
optimizer do the work.  I can't say if that would play well with
the SPARC front-end that mimics the way x86 does flag handling.
(I would argue that on x86 you have no choice because most
instructions do touch flags, while it's not true on SPARC, IIRC.)

I am afraid you'll have to try yourself and see if you gain
something :-)


Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]