qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/27] tcg: Simplify temporary usage


From: Emilio Cota
Subject: Re: [PATCH 00/27] tcg: Simplify temporary usage
Date: Fri, 10 Feb 2023 07:35:26 -0500

Hi Richard,

On Mon, Jan 30, 2023 at 10:59:07 -1000, Richard Henderson wrote:
(snip)
> With this, and by not recycling TEMP_LOCAL, we can get identical code
> out of the backend even when changing the front end translators are
> adjusted to use TEMP_LOCAL for everything.
> 
> Benchmarking one test case, qemu-arm linux-test, the new liveness pass
> comes in at about 1.6% on perf, but I can't see any difference in
> wall clock time before and after the patch set.

I ran yesterday linux-user SPEC06 benchmarks from your tcg-life branch.
I do see perf regressions for two workloads (sjeng and xalancbmk).
With perf(1) I see liveness_pass* are at 0.00%, so I wonder: is it
possible that the emitted code isn't quite the same?

Happy to run more tests if helpful. Results below.

Thanks,
                Emilio

- bar chart, png: https://postimg.cc/ZCTkbYS9
- bar chart, txt:

                                                       Speedup of tcg-life 
(de6361f6) over master (ae2b5d83)
                                                           Host: AMD Ryzen 7 
PRO 5850U. Compiler: gcc12
  1.03 
+----------------------------------------------------------------------------------------------------------------------------------------------------+
  1.02 
|-+.............................................................................................|..................................................+-|
       |                                                                        
                       |                                                    |
  1.01 
|-+.............................................................................................|..................................................+-|
     1 
|-+.....**+-+*.....*+-+**......+-+...........................**+-+**..............**+-+**....***|**...............**+-+**..........................+-|
       |       *    *     *+-+ *    **+-+**      +-+       +-+      * +-+ *     
         *     *    *  | *               * +-+ *                +-+         |
  0.99 
|-+.....*....*.....*....*....*.....*....***|**.....**|***....*.....*..............*.....*....*..|.*......+-+......*.....*...............*+-+**.....+-|
  0.98 
|-+.....*....*.....*....*....*.....*....*.+-+*.....*+-+.*....*.....*....**+-+*....*.....*....*..|.*.......|.......*.....*...............*....*.....+-|
       |       *    *     *    *    *     *    *    *     *    *    *     *    
* +-+*    *     *    * +-+*     **|***    *     *               *    *       |
  0.97 
|-+.....*....*.....*....*....*.....*....*....*.....*....*....*.....*....*....*....*.....*....*....*.....*.|..*....*.....*......+-+......*....*.....+-|
  0.96 
|-+.....*....*.....*....*....*.....*....*....*.....*....*....*.....*....*....*....*.....*....*....*.....*+-+.*....*.....*.......|.......*....*.....+-|
       |       *    *     *    *    *     *    *    *     *    *    *     *    
*    *    *     *    *    *     *    *    *     *    ***|**     *    *       |
  0.95 
|-+.....*....*.....*....*....*.....*....*....*.....*....*....*.....*....*....*....*.....*....*....*.....*....*....*.....*....*.+-+*.....*....*.....+-|
  0.94 
+----------------------------------------------------------------------------------------------------------------------------------------------------+
     400.perlbench 401.bzip2    403.gcc    429.mcf 445.gobmk  456.hmmer  
458.s462.libquantum464.h264re471.omnetpp  473.ast483.xalancbmk   geomean

- Raw data for the bar chart:

  + baseline:
# benchmark     mean    stdev   raw
400.perlbench   94.4343747333333        0.331828752549838       
94.131272,94.421923,94.34074,94.747239,94.982504,94.602928,93.743109,94.077325,94.220688,94.505739,94.598781,94.779386,94.177626,94.811701,94.37466
401.bzip2       83.0563643333333        0.270338451882521       
83.603378,82.784967,82.766427,83.703505,83.018864,82.859924,83.128875,83.052816,82.921046,82.809962,83.027326,83.122502,83.099782,83.005817,82.940274
403.gcc 2.8751204       0.0183794528241263      
2.872445,2.886974,2.884226,2.877824,2.871482,2.927202,2.864385,2.86503,2.855154,2.856129,2.86079,2.861818,2.887109,2.867046,2.889192
429.mcf 13.527965       0.0849965442919382      
13.498908,13.494126,13.469952,13.606229,13.604864,13.513806,13.472737,13.572454,13.407602,13.70441,13.487249,13.562176,13.503575,13.39053,13.630857
445.gobmk       279.017610333333        1.91925368167126        
279.808944,278.057813,278.831984,279.388752,276.801944,280.078062,278.675088,277.094009,279.452037,278.832294,278.843473,279.407613,275.879438,284.430909,279.681795
456.hmmer       103.296133533333        0.38166706019324        
103.33233,102.944119,103.083766,103.001765,104.302275,103.329573,103.720265,103.537909,102.931565,103.008669,102.974703,103.5448,103.484963,102.958228,103.287073
458.sjeng       332.387649666667        0.868297133920158       
331.71233,333.413204,333.367836,332.57489,331.818019,331.14369,333.848697,333.135605,332.878587,332.069454,332.003468,332.692292,331.01894,331.426129,332.711604
462.libquantum  4.12260253333333        0.00508688019554322     
4.121422,4.116031,4.131564,4.113532,4.117144,4.124039,4.128896,4.118079,4.121929,4.124027,4.125302,4.124549,4.119102,4.125368,4.128054
464.h264ref     244.092639533333        13.3464074285764        
239.569243,240.187437,240.760271,241.483515,240.772044,241.492141,240.530232,240.449723,240.679955,240.464527,241.3703,292.302111,240.254072,240.490477,240.583545
471.omnetpp     261.340260533333        3.7694119109844 
263.463533,259.640839,260.834291,263.816131,256.877675,259.833289,258.708458,261.868763,260.75424,265.656161,257.900388,265.734187,256.747515,270.004887,258.263551
473.astar       142.966170866667        0.481395129184935       
142.636087,142.675786,141.895549,143.236359,142.892086,142.325069,143.267024,143.910479,143.279771,142.666683,143.11241,143.15343,143.041394,143.391831,143.008605
483.xalancbmk   401.605619866667        3.99007996364547        
401.101824,400.266261,396.474675,406.136427,404.400767,406.339383,397.442574,409.241015,399.084079,399.828507,402.585078,394.89061,404.722299,401.654323,399.916476

  + tcg-life:
# benchmark     mean    stdev   raw
400.perlbench   94.1968828666667        0.352661861692484       
94.726037,94.169276,93.893696,94.224617,94.613626,94.471446,94.198829,94.616742,93.845426,93.435601,94.040449,94.574709,94.105065,94.007179,94.030545
401.bzip2       83.0027554666667        0.214192109333076       
83.181646,83.299212,83.342217,82.848151,82.808142,82.888099,82.942223,82.777883,82.739787,82.770313,83.01728,83.327844,83.201232,82.905666,82.991637
403.gcc 2.87870153333333        0.0304401106926527      
2.860922,2.867219,2.860457,2.888637,2.879031,2.87397,2.882131,2.896422,2.865079,2.870739,2.847357,2.864518,2.901592,2.849287,2.973162
429.mcf 13.6952006666667        0.155876459519191       
13.734646,13.746608,13.528171,13.577692,13.534005,13.65201,13.947822,13.541465,13.710553,13.787918,13.521862,13.997184,13.546621,13.848357,13.753096
445.gobmk       282.1855452     1.68500895181812        
281.715494,282.875207,282.073035,281.660872,281.96679,278.912804,281.078281,283.777396,283.485664,278.564193,283.900278,283.662609,282.781748,284.176339,282.152468
456.hmmer       103.3804904     0.554303069916862       
103.077106,103.013059,103.247046,105.192431,103.221722,102.99502,103.787524,103.086281,103.213953,103.048905,103.042041,103.664296,103.278652,103.445109,103.394211
458.sjeng       339.3596132     3.77963378278808        
341.545293,341.249426,336.87165,343.192545,338.087093,339.691087,337.29754,341.586473,336.838538,345.476397,339.196873,342.773593,337.546389,329.312139,339.729162
462.libquantum  4.1225128       0.00546800475754836     
4.112292,4.119043,4.119803,4.129127,4.117612,4.122837,4.120172,4.121449,4.127452,4.113505,4.129305,4.128303,4.126079,4.127113,4.1236
464.h264ref     243.447219066667        0.924288945630674       
241.71547,242.724405,242.751474,243.730945,243.889673,243.254516,244.328523,244.374465,243.447008,245.45696,243.256098,242.348791,243.440131,242.895642,244.094185
471.omnetpp     268.2971082     5.67916415832786        
271.509491,273.656661,274.294363,266.501929,272.7864,267.868119,271.032049,267.085038,256.124737,270.430985,271.586944,256.427087,268.23723,264.012334,272.903256
473.astar       142.842279266667        0.482819143874435       
142.820726,142.742386,143.237814,143.241978,142.761549,142.026643,143.042933,142.849644,143.035134,142.150158,142.066603,143.086841,143.701693,142.553374,143.316713
483.xalancbmk   420.324755133333        8.22679014442942        
424.925688,433.128404,415.710656,423.156208,428.067657,426.100068,429.6215,412.083569,411.921022,410.749722,407.134107,414.478705,416.110115,430.104758,421.579148

I then ran perf record on xalancbmk before/after:

$ time for suffix in gcc12; do for tag in tcg-life-baseline tcg-life; do perf 
record -o /tmp/$tag-$suffix.perf.data -k 1 taskset -c 2 ./spec06.pl 
--iterations=1 --size=train --config=aarch64 --show-raw run 
~/src/dbt-bench/out/$tag-$suffix/bin/qemu-aarch64 ~/src/spec/spec06-aarch64 
xalancbmk; done; done
483.xalancbmk (#1/1)
run_base_train_aarch64.0068.qemu-aarch64: qemu-aarch64 Xalan_base.aarch64   -v 
allbooks.xml xalanc.xsl:         410.191153s

# benchmark     mean    stdev   raw
483.xalancbmk   410.191153      0       410.191153
[ perf record: Woken up 251 times to write data ]
[ perf record: Captured and wrote 62.629 MB 
/tmp/tcg-life-baseline-gcc12.perf.data (1641030 samples) ]
483.xalancbmk (#1/1)
run_base_train_aarch64.0069.qemu-aarch64: qemu-aarch64 Xalan_base.aarch64   -v 
allbooks.xml xalanc.xsl:         464.428108s

# benchmark     mean    stdev   raw
483.xalancbmk   464.428108      0       464.428108
[ perf record: Woken up 284 times to write data ]
[ perf record: Captured and wrote 70.905 MB /tmp/tcg-life-gcc12.perf.data 
(1857959 samples) ]

real    14m35.863s
user    14m34.897s
sys     0m0.925s

- perf report (baseline):
# Total Lost Samples: 0
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 1797955092780
#
# Overhead  Command       Shared Object            Symbol                       
               
# ........  ............  .......................  
............................................
#
    43.83%  qemu-aarch64  qemu-aarch64             [.] helper_lookup_tb_ptr
     5.56%  qemu-aarch64  qemu-aarch64             [.] cpu_get_tb_cpu_state
     2.23%  qemu-aarch64  qemu-aarch64             [.] qht_lookup_custom
     1.57%  qemu-aarch64  qemu-aarch64             [.] tb_htable_lookup
     1.29%  qemu-aarch64  qemu-aarch64             [.] tb_lookup_cmp
     0.72%  qemu-aarch64  qemu-aarch64             [.] interval_tree_iter_first
     0.28%  qemu-aarch64  qemu-aarch64             [.] helper_vfp_cmpd_a64
     0.27%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f79244b2a43
     0.24%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f792449c058
     0.20%  qemu-aarch64  qemu-aarch64             [.] page_get_flags
     0.20%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924349c22
     0.19%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924349c40
     0.18%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924349203
     0.17%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7923e09b03
     0.17%  qemu-aarch64  qemu-aarch64             [.] helper_vfp_cmped_a64
     0.15%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7923e9f965
     0.15%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924552f2b
     0.15%  qemu-aarch64  qemu-aarch64             [.] float64_hs_compare
     0.14%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f79244f7003
     0.14%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924552a03
     0.14%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924349243
     0.14%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924546df6
     0.13%  qemu-aarch64  qemu-aarch64             [.] get_page_addr_code_hostp
     0.12%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f792454de7b
     0.12%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924555a85
     0.12%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f792454f465
     0.12%  qemu-aarch64  qemu-aarch64             [.] float64_add
     0.12%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f792439af03
     0.11%  qemu-aarch64  [JIT] tid 561758         [.] 0x00007f7924554b43
     [...]
     0.00%  qemu-aarch64  qemu-aarch64             [.] liveness_pass_1

- perf report (tcg-life):
# Total Lost Samples: 0
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 2035140825489
#
# Overhead  Command       Shared Object            Symbol                       
               
# ........  ............  .......................  
............................................
#
    43.00%  qemu-aarch64  qemu-aarch64             [.] helper_lookup_tb_ptr
     5.73%  qemu-aarch64  qemu-aarch64             [.] cpu_get_tb_cpu_state
     2.16%  qemu-aarch64  qemu-aarch64             [.] qht_lookup_custom
     1.58%  qemu-aarch64  qemu-aarch64             [.] tb_htable_lookup
     1.10%  qemu-aarch64  qemu-aarch64             [.] tb_lookup_cmp
     0.40%  qemu-aarch64  qemu-aarch64             [.] interval_tree_iter_first
     0.26%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb37d4018
     0.25%  qemu-aarch64  qemu-aarch64             [.] helper_vfp_cmpd_a64
     0.22%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb387ecb6
     0.21%  qemu-aarch64  qemu-aarch64             [.] page_get_flags
     0.19%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb3141b03
     0.17%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb3681d62
     0.16%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb388ae2b
     0.16%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb37ea9c3
     0.16%  qemu-aarch64  qemu-aarch64             [.] helper_vfp_cmped_a64
     0.15%  qemu-aarch64  qemu-aarch64             [.] get_page_addr_code_hostp
     0.15%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb3887325
     0.15%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb389ddc3
     0.14%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb3681d80
     0.14%  qemu-aarch64  qemu-aarch64             [.] float64_hs_compare
     0.13%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb36d2f83
     0.12%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb3885b65
     0.12%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb37eab43
     0.12%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb388a903
     0.12%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb31d7925
     0.11%  qemu-aarch64  qemu-aarch64             [.] parts64_float_to_sint
     0.11%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb3885d3b
     0.11%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb3681383
     0.11%  qemu-aarch64  [JIT] tid 562312         [.] 0x00007fdeb388a683
     [...]
     0.00%  qemu-aarch64  qemu-aarch64             [.] liveness_pass_1
     0.00%  qemu-aarch64  qemu-aarch64             [.] liveness_pass_0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]