[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 07/11] target/arm: optimize indirect branches
From: |
Aurelien Jarno |
Subject: |
Re: [Qemu-devel] [PATCH v4 07/11] target/arm: optimize indirect branches |
Date: |
Thu, 27 Apr 2017 11:36:39 +0200 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
On 2017-04-26 23:29, Emilio G. Cota wrote:
> Speed up indirect branches by jumping to the target if it is valid.
>
> Softmmu measurements (see later commit for user-mode results):
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> - Impact on Boot time
>
> | setup | ARM debian jessie boot+shutdown time | stddev |
> |--------+--------------------------------------+--------|
> | v2.9.0 | 8.84 | 0.07 |
> | +cross | 8.85 | 0.03 |
> | +jr | 8.83 | 0.06 |
>
> - NBench, arm-softmmu (debian jessie guest). Host:
> Intel i7-4790K @ 4.00GHz
>
> 1.3x
> +-+-------------------------------------------------------------------------------------------------------------+-+
> |
> |
> | cross
> #### |
> 1.25x
> +cross+jr..........................................................#++#.........................................+-+
> | #### #
> # |
> | +++# # #
> # |
> | +++ **** # #
> # |
> 1.2x
> +-+...................................####............*..*..#......#..#.........................................+-+
> | **** # * * # #
> # #### |
> | * * # * * # #
> # # # |
> 1.15x
> +-+................................*..*..#............*..*..#......#..#.....#..#................................+-+
> | * * # * * # #
> # # # |
> | * * # #### * * # #
> # # # |
> | * * # # # * * # #
> # # # #### |
> 1.1x
> +-+................................*..*..#......#..#..*..*..#......#..#.....#..#.........................#..#...+-+
> | * * # # # * * # #
> # # # # # |
> | * * # # # * * # #
> # # # # # |
> 1.05x
> +-+..........................####..*..*..#......#..#..*..*..#......#..#.....#..#......+++............*****..#...+-+
> | ***** # * * # # # * * # *****
> # # # +++ | ****### * * # |
> | *+++* # * * # # # * * # *+++*
> # **** # *****### * * # * * # |
> | *****### +++#### * * # * * # ***** # * * # * *
> # * * # * | *++# * * # * * # |
> 1x
> +-++-+*+++*-+#++****++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-++-+
> | * * # * * # * * # * * # * * # * * # * *
> # * * # * * # * * # * * # |
> | * * # * * # * * # * * # * * # * * # * *
> # * * # * * # * * # * * # |
> 0.95x
> +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
> ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU
> DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
> png: http://imgur.com/eOLmZNR
>
> NB. 'cross' represents the previous commit.
>
> Signed-off-by: Emilio G. Cota <address@hidden>
> ---
> target/arm/translate.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 02cad96..d46a576 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -65,6 +65,7 @@ static TCGv_i32 cpu_R[16];
> TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
> TCGv_i64 cpu_exclusive_addr;
> TCGv_i64 cpu_exclusive_val;
> +static bool gen_jr;
>
> /* FIXME: These should be removed. */
> static TCGv_i32 cpu_F0s, cpu_F1s;
> @@ -221,6 +222,7 @@ static void store_reg(DisasContext *s, int reg, TCGv_i32
> var)
> */
> tcg_gen_andi_i32(var, var, s->thumb ? ~1 : ~3);
> s->is_jmp = DISAS_JUMP;
> + gen_jr = true;
> }
> tcg_gen_mov_i32(cpu_R[reg], var);
> tcg_temp_free_i32(var);
> @@ -893,6 +895,7 @@ static inline void gen_bx_im(DisasContext *s, uint32_t
> addr)
> tcg_temp_free_i32(tmp);
> }
> tcg_gen_movi_i32(cpu_R[15], addr & ~1);
> + gen_jr = true;
> }
>
> /* Set PC and Thumb state from var. var is marked as dead. */
> @@ -902,6 +905,7 @@ static inline void gen_bx(DisasContext *s, TCGv_i32 var)
> tcg_gen_andi_i32(cpu_R[15], var, ~1);
> tcg_gen_andi_i32(var, var, 1);
> store_cpu_field(var, thumb);
> + gen_jr = true;
> }
>
> /* Variant of store_reg which uses branch&exchange logic when storing
> @@ -12034,6 +12038,20 @@ void gen_intermediate_code(CPUARMState *env,
> TranslationBlock *tb)
> gen_set_pc_im(dc, dc->pc);
> /* fall through */
> case DISAS_JUMP:
> + /*
> + * gen_jr is not set on every DISAS_JUMP because for some of
> those
> + * we do want to exit to the exec loop.
> + */
What would be the reason for that? IIUC the lookup_tb_ptr helper calls
cpu_get_tb_cpu_state to get the new TB flags go lookup from the current
CPU state. It means it is able for example to handle a transition from
user to privileged mode. Also the exit_req flag or its new equivalent
is tested at the beginning of each TB in case there is an interruption.
It therefore seems to be that we can replace all calls to
tcg_gen_exit_tb by tcg_gen_lookup_and_goto_ptr with the program counter
in argument.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
address@hidden http://www.aurel32.net
- [Qemu-devel] [PATCH v4 00/11] TCG optimizations for 2.10, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 09/11] target/i386: optimize cross-page direct jumps in softmmu, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 06/11] target/arm: optimize cross-page direct jumps in softmmu, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 02/11] tcg-runtime: add lookup_tb_ptr helper, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 03/11] tcg: introduce goto_ptr opcode, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 07/11] target/arm: optimize indirect branches, Emilio G. Cota, 2017/04/26
- Re: [Qemu-devel] [PATCH v4 07/11] target/arm: optimize indirect branches,
Aurelien Jarno <=
- Re: [Qemu-devel] [PATCH v4 07/11] target/arm: optimize indirect branches, Alex Bennée, 2017/04/27
- [Qemu-devel] [PATCH v4 01/11] exec-all: export tb_htable_lookup, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 05/11] tcg/i386: implement goto_ptr op, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 04/11] tcg: export tcg_gen_lookup_and_goto_ptr, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 08/11] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 11/11] tb-hash: improve tb_jmp_cache hash function in user mode, Emilio G. Cota, 2017/04/26
- [Qemu-devel] [PATCH v4 10/11] target/i386: optimize indirect branches, Emilio G. Cota, 2017/04/26