qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination


From: Laurent Desnogues
Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination
Date: Sun, 15 May 2011 11:27:17 +0200

On Sun, May 15, 2011 at 9:15 AM, Blue Swirl <address@hidden> wrote:
> On Sun, May 15, 2011 at 1:04 AM, Aurelien Jarno <address@hidden> wrote:
>> On Sun, May 15, 2011 at 12:52:35AM +0300, Blue Swirl wrote:
>>> On Sun, May 15, 2011 at 12:16 AM, Aurelien Jarno <address@hidden> wrote:
>>> > On Sat, May 14, 2011 at 10:35:20PM +0300, Blue Swirl wrote:
[...]
>>> > The env register is used very often (basically for every load/store, but
>>> > also a lot of helpers), so it makes sense to reserve a register for it.
>>> >
>>> > For what I understand from your patch series, you prefer to pass this
>>> > register explicitly to TCG functions. This basically means this TCG
>>> > global will be loaded to host register as soon as it is used, but also
>>> > regularly, as globals are saved back to their canonical location before
>>> > an helper or a load/store.
>>> >
>>> > So it seems that this patch series will just allowing the "env register"
>>> > to change over time, though it will not spare one more register for the
>>> > TCG code, and it will emit longer TCG code to regularly reload the env
>>> > global into a host register.
>>>
>>> But there will be one more register available in some cases. In other
>>
>> Inside the TCG code, it will basically happens very rarely, given
>> load/store are really the most used instructions, and they need to load
>> the env register.
>
> Not exactly, from a sample run with -d op_opt:
> $ egrep -v -e '^$' -v -e 'OP after' -v -e ' end' -v -e 'Search PC'
> /tmp/qemu.log | awk '{print $1}' | sort | uniq -c|sort -rn
> 1673966 movi_i32
>  653931 ld_i32
>  607432 mov_i32
>  428684 st_i32
>  326878 movi_i64
>  308626 add_i32
>  283186 call
>  256817 exit_tb
>  207232 nopn
>  189388 goto_tb
>  122398 and_i32
>  117997 shr_i32
>  89107 qemu_ld32
>  82926 set_label
>  82713 brcond_i32
>  67169 qemu_st32
>  55109 or_i32
>  46536 ext32u_i64
>  44288 xor_i32
>  38103 sub_i32
>  26361 shl_i32
>  23218 shl_i64
>  23218 qemu_st64
>  23218 or_i64
>  20474 shr_i64
>  20445 qemu_ld64
>  11161 qemu_ld8u
>  10409 qemu_st8
>   5013 qemu_ld16u
>   3795 qemu_st16
>   2776 qemu_ld8s
>   1915 sar_i32
>   1414 qemu_ld16s
>    839 not_i32
>    579 setcond_i32
>    213 br
>     42 ext32s_i64
>     30 mul_i64

Unless I missed something, this doesn't show the usage of
ld/st per TB, which is what Aurélien was looking for if I
understood correctly.  All I can say is that you had at
most 256817 TB's and 234507 qemu_ld/st, so about one per
TB.

Anyway I must be thick, because I fail to see how
generated code could access guest CPU registers without a
pointer to the CPU env :-)

IIUC the SPARC translator uses ld_i32/st_i32 mainly for
accessing the guest CPU registers, which due to register
windows is held in a dedicated global temp.  Is that
correct?  If so this is kind of hiding accesses to the
CPU env;  all other targets read/write registers by using
CPU env (through the use global temps in most cases).

So I think most (if not almost all) TB will need a pointer
to CPU env, which is why I think Aurélien's proposal to
keep a dedicated register that'd be loaded in the prologue
is the only way to not degrade performance of the
generated code (I'd add that this dedicated register
should be the one defined by the ABI as holding the first
parameter value, if that's possible;  I'm afraid this is
not necessarily a good idea).


Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]