qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to spe


From: Artyom Tarasenko
Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
Date: Tue, 18 Aug 2015 11:24:30 +0200

On Mon, Aug 3, 2015 at 11:17 AM, Aurelien Jarno <address@hidden> wrote:
> On 2015-08-03 10:31, Artyom Tarasenko wrote:
>> Hi Aurelien,
>>
>> On Fri, Jul 31, 2015 at 5:43 PM, Aurelien Jarno <address@hidden> wrote:
>>
>> >> > It uses a lot of integer functions
>> >> > based on CPU flags, so most of the time is spent computing them in
>> >> > helper_compute_psr.
>> >>
>> >> I wonder if this can be optimized. I guess most RISC CPUs would have a
>> >> similar problem. Unlike x86, the compilers usually optimize
>> >> instructions on flag usage. If there is an instruction modifying flags
>> >> in a code, the flags will be used for sure, so it probably makes a
>> >> little sense to pospone the flag computation?
>> >
>> > Indeed. ARM and SH4 use one TCG temp per flag, and they can be computed
>> > one by one using setcond. The optimizer and the liveness analysis then
>> > get rid of the unused computation. However while it allows intra-TB
>> > optimization, it prevent any other flags optimization. Therefore the
>> > only way to know if it is a good idea or not is to implement it and
>> > benchmark that, but using a bit more than a single biased benchmark like
>> > the one from sysbench.
>> >
>> > Also note that the current implementation predates the introduction of
>> > setcond, which is necessary to be able to compute the flags using TCG
>> > code.
>>
>> Thanks for explaining it, the problem is much more clear now.
>> Moving to setcond is definitely worth a shot. I'd like to play with it.
>> What would be the minimal entity to change without reworking the complete 
>> TCG:
>>  a) one flag for one instruction,
>>  b) all flags for one instruction,
>>  c) one flag for all instructions,
>> or d) all flags for all instructions (gradually moving to setcond is
>> not possible) ?
>
> You should with the c) option. You can look at how I done this for SH4,
> starting with commit 5ed9a259c164bb9fd2a6fe8a363a4bda2e4a5461.

FWIW I tried this for Z and N flags, but the resulting code was slower
than the current implementation.

Actually the current implementation is already very good intra-TB
optimized: for the case where a conditional branch/move follows a
compare operation no external helpers are called.

The unoptimized case is a sequence of multiple cmp and branch
operations (likely created by a "case" statement in the original
source code), especially where cmp is in a delay slot of a branch
instruction.

I wonder whether we always have to finish a TB on a conditional jump.
Maybe it would make sense to translate further if a destination of a
jump is not too far from dc->pc? The definition of "not too far" is
indeed tricky.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]