qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 09/50] tcg: Use per-temp state data in livene


From: Emilio G. Cota
Subject: Re: [Qemu-devel] [PATCH v6 09/50] tcg: Use per-temp state data in liveness
Date: Tue, 17 Oct 2017 17:50:03 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

On Mon, Oct 16, 2017 at 10:25:28 -0700, Richard Henderson wrote:
> From: Richard Henderson <address@hidden>
> 
> This avoids having to allocate external memory for each temporary.
> 
> Signed-off-by: Richard Henderson <address@hidden>
> ---

Unfortunately, this patch undoes the small perf gains we made so far in
this series.

We end up running more instructions, I guess due to the loops in
setting the per-temp states (whereas earlier we just had a memset).
Same aarch64 boot benchmark, 10 runs:

Before:

       7125.400889      task-clock (msec)         #    0.998 CPUs utilized      
      ( +-  0.15% )
            21,654      context-switches          #    0.003 M/sec              
      ( +-  0.12% )
                 1      cpu-migrations            #    0.000 K/sec              
    
             8,034      page-faults               #    0.001 M/sec              
      ( +-  1.22% )
    30,050,759,263      cycles                    #    4.217 GHz                
      ( +-  0.15% )
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
    53,764,201,351      instructions              #    1.79  insns per cycle    
      ( +-  0.09% )
     9,677,042,191      branches                  # 1358.105 M/sec              
      ( +-  0.09% )
       170,903,903      branch-misses             #    1.77% of all branches    
      ( +-  0.16% )

       7.136617151 seconds time elapsed                                         
 ( +-  0.17% )

After:
       7326.945822      task-clock (msec)         #    0.999 CPUs utilized      
      ( +-  0.24% )
            21,997      context-switches          #    0.003 M/sec              
      ( +-  0.16% )
                 1      cpu-migrations            #    0.000 K/sec              
    
             8,400      page-faults               #    0.001 M/sec              
      ( +-  4.63% )
    30,900,509,346      cycles                    #    4.217 GHz                
      ( +-  0.23% )
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
    55,736,672,258      instructions              #    1.80  insns per cycle    
      ( +-  0.16% )
     9,989,723,969      branches                  # 1363.423 M/sec              
      ( +-  0.16% )
       179,662,782      branch-misses             #    1.80% of all branches    
      ( +-  0.16% )

       7.335805286 seconds time elapsed                                         
 ( +-  0.24% )

I tried merging .state into the bitfield, but that didn't help (the dcache isn't
the issue here).

Anyway we use .state_ptr later in this series, so:

Reviewed-by: Emilio G. Cota <address@hidden>

                E.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]