[Qemu-arm] [PATCH 10/10] tb-hash: improve tb_jmp

qemu-arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-arm] [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in

From:	Emilio G. Cota
Subject:	[Qemu-arm] [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in user mode
Date:	Tue, 11 Apr 2017 21:17:30 -0400

Optimizations to cross-page chaining and indirect jumps make
performance more sensitive to the hit rate of tb_jmp_cache.
The constraint of reserving some bits for the page number
lowers the achievable quality of the hashing function.

However, user-mode does not have this requirement. Thus,
with this change we use for user-mode a hashing function that
is both faster and of better quality than the previous one.

Measurements:

-    specINT 2006 (test set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz
                              Y axis: Speedup over 95b31d70

     1.3x+-+-------------------------------------------------------------+-+
         |        jr             $$                                        |
    1.25x+-+....  jr+xxhash      %%  ....................................+-+
         |        jr+hash+inline @@                 +++                    |
     1.2x+-+.............................................................+-+
         |                                          @@@                    |
         |                    +++@@               ++@:@       +++  @@+     |
    1.15x+-+..................$$$@@address@hidden@.......@@...@@....+-+
         |                    $ $@@               $$@ @      %%@   @@      |
     1.1x+-+..................$.$@@address@hidden@address@hidden@@....+-+
         |          +++@@+    $ $@@               $$@ @    address@hidden@@   
+++|
    
1.05x+-+.........$$@@.....$.$@@...@@address@hidden@..@@@address@hidden@@...@@-+
         |           $$@@     $ $@@$$$@@          $$% @address@hidden@$$%@ 
$$@@+$$@@ |
         |+$$++++++++$$@@+++@@$ $@@$+$@@+++@@$$+@@$$% @address@hidden@$$%@ $$%@ 
$$@@ |
       
1x+-$$@@address@hidden@@R$$@@address@hidden@$$s@@address@hidden@address@hidden@address@hidden@-+
         | $$@@+$$%@ $$%@ $$@@address@hidden address@hidden@address@hidden @$$% 
@$$%@ $$%@ $$%@ |
    
address@hidden@address@hidden@address@hidden@address@hidden@address@hidden@address@hidden@address@hidden
         | $$%@ $$%@ $$%@ address@hidden address@hidden address@hidden 
address@hidden address@hidden @$$% @$$%@ $$%@ $$%@ |
     
address@hidden@address@hidden@address@hidden@address@hidden@$$%@@$$%@@address@hidden@address@hidden
           astabzip2 gcc gobmh264rehmlibquantumcfomneperlbensjexalanchmean
  png: http://imgur.com/RiaBuIi

That is, a 6.45% hmean improvement for this commit. Note that this is the
test set, so some benchmarks take almost no time (and therefore aren't that
sensitive to changes here). See "train" results below.

Note also that hashing quality is not the only requirement: xxhash
gives on average the highest hit rates. However, the time spent computing
the hash negates the performance gains coming from the increase in hit rate.
Given these results, I dropped xxhash from subsequent experiments.

-   specINT 2006 (train set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz
                              Y axis: Speedup over 95b31d70

    1.4x+-+--------------------------------------------------------------+-+
        |    jr      $$                                           +++      |
        |    jr+hash %%                                            :       |
    1.3x+-+.......................................................%%%....+-+
        |                                               +++  +++  %:%      |
        |                      +++                      %%%   :   %+%      |
    1.2x+-+.....................%%......................%.%..%%%.$$.%....+-+
        |                     ++%%                 %%% $$+%  %:% $$+%      |
        |            +++      $$$%                $$+% $$ %  %:% $$ %      |
    1.1x+-+...........%%......$.$%................$$.%.$$.%.$$.%.$$.%..%%%-+
        |  +++        %%      $ $%            +++ $$ % $$ % $$ % $$ % +%+% |
        | ++%%  +++ ++%% ++%% $ $% $$$+ +++   %%% $$ % $$ % $$ % $$ % $$+% |
      1x+-$$$%RGR%%R$$$%H$$$%P$j$%h$s$%.$$%%..%.%.$$.%.$$.%.$$.%.$$.%.$$.%-+
        | $+$% $$$% $ $% $+$% $ $% $ $% $$+%  % % $$ % $$ % $$ % $$ % $$ % |
        | $ $% $ $% $ $% $ $% $ $% $ $% $$ %  % % $$ % $$ % $$ % $$ % $$ % |
    0.9x+-$.$%.$.$%.$.$%.$.$%.$.$%.$.$%.$$.%..%.%.$$.%.$$.%.$$.%.$$.%.$$.%-+
        | $ $% $ $% $ $% $ $% $ $% $ $% $$ % $$+% $$ % $$ % $$ % $$ % $$ % |
        | $ $% $ $% $ $% $ $% $ $% $ $% $$ % $$ % $$ % $$ % $$ % $$ % $$ % |
    0.8x+-$$$%-$$$%-$$$%-$$$%-$$$%-$$$%-$$%%-$$%%-$$%%-$$%%-$$%%-$$%%-$$%%-+
          astarbzip2 gcc gobmh264rehlibquantumcfomneperlbensjexalancbhmean
  png: http://imgur.com/55iJJgD

That is, a 10.19% hmean improvement for jr+hash (this commit).

-               NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz
                              Y axis: Speedup over 95b31d70

    1.35x+-+-------------------------------------------------------------+-+
         |               @@@   jr              $$                          |
     address@hidden@.  jr+inline       %%  ...@@@................+-+
         |               @ @   jr+inline+hash  @@     @ @                  |
         |               @ @                          @ @                  |
    address@hidden@address@hidden@................+-+
         |               @ @                    @@@   @ @                  |
     address@hidden@address@hidden@address@hidden
         |               @ @                  $$% @   @ @                  |
         |               @ @        %%@       $$% @  %% @                  |
    address@hidden@address@hidden@address@hidden
         |               @ @        %%@       $$% @$ $% @                  |
     address@hidden@address@hidden@address@hidden@@-+
         |               @ @      $ $%@       $$% @$ $% @               @@ |
         |               @ @      $ $%@ $$%%@ $$% @$ $% @            $$%%@ |
    address@hidden@@address@hidden@address@hidden@.........@@address@hidden
         | $$%%@       $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @       %%%@ $$ %@ |
       
address@hidden@address@hidden@address@hidden@address@hidden@address@hidden@address@hidden
         
address@hidden@-$$%@@$$$%@@address@hidden@-$$%@@$$$%@@address@hidden@address@hidden
        ASSIGNMBITFIELFOFP_EMULATHUFFMANLU_DECOMPNEURNUMERICSTRING_SOhmean
  png: http://imgur.com/i5e1gdY

That is, a 11% hmean perf gain--it almost doubles the perf gain
from implementing the jr optimization.

-              NBench, x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz

     1.1x+-+-------------------------------------------------------------+-+
         |         jr             $$                                       |
    1.08x+-+.....  jr+inline      %%  ...................................+-+
         |         jr+inline+hash @@                                       |
         | $$ @@                                                           |
    1.06x+-$$.@@.........................%%%.............................+-+
         | $$%%@                         % %                               |
    address@hidden
         | $$ %@         @@@            $$ %                   $$          |
         | $$ %@         @ @  %%        $$ %                   $$%%@       |
    address@hidden@$$$%@@address@hidden@@address@hidden@-+
         | $$ %@    @@  %% @$ $% @$$$   $$ %@ $$% @  %%@@$$$%  $$ %@ $$ %@ |
       
address@hidden@@address@hidden@address@hidden@address@hidden@address@hidden@-+
         | $$ %@ $$%%@ $$% @$ $% @$ $%  $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ |
    
address@hidden@address@hidden@address@hidden@address@hidden@address@hidden@address@hidden
         | $$ %@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ |
         | $$ %@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ |
    
address@hidden@address@hidden@address@hidden@address@hidden@address@hidden@address@hidden
         
address@hidden@-$$%@@$$$%@@address@hidden@-$$%@@$$$%@@address@hidden@address@hidden
        ASSIGNMBITFIELFOFP_EMULATHUFFMANLU_DECOMPNEURNUMERICSTRING_SOhmean
  png: http://imgur.com/Xu0Owgu

The fact that NBench is not very sensitive to changes here was mentioned
in the previous commit's log. We get a very slight overall decrease in hmean
performance, although some workloads improve as well. Note that there are
no error bars: NBench re-runs itself until confidence on the stability of
the average is >= 95%, and it doesn't report the resulting stddev.

Signed-off-by: Emilio G. Cota <address@hidden>
---
 include/exec/tb-hash.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 2c27490..b1fe2d0 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -22,6 +22,8 @@
 
 #include "exec/tb-hash-xx.h"
 
+#ifdef CONFIG_SOFTMMU
+
 /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
    addresses on the same page.  The top bits are the same.  This allows
    TLB invalidation to quickly clear a subset of the hash table.  */
@@ -45,6 +47,16 @@ static inline unsigned int 
tb_jmp_cache_hash_func(target_ulong pc)
            | (tmp & TB_JMP_ADDR_MASK));
 }
 
+#else
+
+/* In user-mode we can get better hashing because we do not have a TLB */
+static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
+{
+    return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
+}
+
+#endif /* CONFIG_SOFTMMU */
+
 static inline
 uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
 {
-- 
2.7.4

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-arm] [PATCH 00/10] TCG optimizations for 2.10, Emilio G. Cota, 2017/04/11
- [Qemu-arm] [PATCH 07/10] tcg: add tcg_temp_local_new_ptr, Emilio G. Cota, 2017/04/11
- [Qemu-arm] [PATCH 02/10] exec-all: inline tb_from_jmp_cache, Emilio G. Cota, 2017/04/11
- [Qemu-arm] [PATCH 01/10] exec-all: add tb_from_jmp_cache, Emilio G. Cota, 2017/04/11
- [Qemu-arm] [PATCH 08/10] target/arm: optimize indirect branches with TCG's jr op, Emilio G. Cota, 2017/04/11
- [Qemu-arm] [PATCH 04/10] target/i386: optimize cross-page block chaining in softmmu, Emilio G. Cota, 2017/04/11
- [Qemu-arm] [PATCH 06/10] tcg: add brcondi_ptr, Emilio G. Cota, 2017/04/11
- [Qemu-arm] [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in user mode, Emilio G. Cota <=
  - Re: [Qemu-arm] [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in user mode, Paolo Bonzini, 2017/04/11
    - Re: [Qemu-arm] [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in user mode, Emilio G. Cota, 2017/04/12
- [Qemu-arm] [PATCH 03/10] target/arm: optimize cross-page block chaining in softmmu, Emilio G. Cota, 2017/04/11
  - Re: [Qemu-arm] [Qemu-devel] [PATCH 03/10] target/arm: optimize cross-page block chaining in softmmu, Richard Henderson, 2017/04/15
- [Qemu-arm] [PATCH 05/10] tcg: add jr opcode, Emilio G. Cota, 2017/04/11
  - Re: [Qemu-arm] [Qemu-devel] [PATCH 05/10] tcg: add jr opcode, Paolo Bonzini, 2017/04/13
  - Re: [Qemu-arm] [Qemu-devel] [PATCH 05/10] tcg: add jr opcode, Richard Henderson, 2017/04/15
    - Re: [Qemu-arm] [Qemu-devel] [PATCH 05/10] tcg: add jr opcode, Emilio G. Cota, 2017/04/16
- [Qemu-arm] [PATCH 09/10] target/i386: optimize indirect branches with TCG's jr op, Emilio G. Cota, 2017/04/11
  - Re: [Qemu-arm] [PATCH 09/10] target/i386: optimize indirect branches with TCG's jr op, Paolo Bonzini, 2017/04/11

Prev by Date: [Qemu-arm] [PATCH 00/10] TCG optimizations for 2.10
Next by Date: [Qemu-arm] [PATCH 03/10] target/arm: optimize cross-page block chaining in softmmu
Previous by thread: [Qemu-arm] [PATCH 06/10] tcg: add brcondi_ptr
Next by thread: Re: [Qemu-arm] [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in user mode
Index(es):
- Date
- Thread