[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-arm] [PATCH v4 10/11] target/i386: optimize indirect branches
From: |
Emilio G. Cota |
Subject: |
[Qemu-arm] [PATCH v4 10/11] target/i386: optimize indirect branches |
Date: |
Wed, 26 Apr 2017 23:29:23 -0400 |
Speed up indirect branches by jumping to the target if it is valid.
Softmmu measurements (see later commit for user-mode numbers):
Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
- SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest).
Host: Intel i7-4790K @ 4.00GHz
2.4x
+-+--------------------------------------------------------------------------------------------------------------+-+
|
|
| cross
|
2.2x
+cross+jr..........................................................................+++...........................+-+
|
| |
|
+++ | |
2x
+-+..............................................................................|..|............................+-+
|
| | |
|
| | |
1.8x
+-+..............................................................................|####...........................+-+
|
|# |# |
|
**** |# |
1.6x
+-+............................................................................*.|*.|#...........................+-+
|
* |* |# |
|
* |* |# |
1.4x
+-+.......................................................................+++..*.|*.|#...........................+-+
| ++++++
#### * |*++# +++ |
| +++ | |
#++# *++* # +++ | |
1.2x
+-+......................###.....####....+++............|..|...........****..#.*..*..#....####...|.###.....####..+-+
| +++ **** # **** # #### ***###
*++* # * * # #++# ****|# +++#++# |
| ****### +++ *++* # *++* # ++# # #### *|* |# +++ *
* # * * # *** # *| *|# **** # |
1x
+-++-*++*++#++***###++*++*+#++*+-*++#+****++#++***++#+-*+*++#-+****##++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+
| * * # * * # * * # * * # * * # * * # *|* |# *++* # *
* # * * # * * # * * # * * # |
| * * # * * # * * # * * # * * # * * # *+*++# * * # *
* # * * # * * # * * # * * # |
0.8x
+-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf
omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/DU36YFU
NB. 'cross' represents the previous commit.
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
---
target/i386/translate.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index ea113fe..674ec96 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4996,7 +4996,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
gen_push_v(s, cpu_T1);
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 3: /* lcall Ev */
gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5014,7 +5014,8 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
tcg_const_i32(dflag - 1),
tcg_const_i32(s->pc - s->cs_base));
}
- gen_eob(s);
+ tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+ gen_jr(s, cpu_tmp4);
break;
case 4: /* jmp Ev */
if (dflag == MO_16) {
@@ -5022,7 +5023,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
}
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 5: /* ljmp Ev */
gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5037,7 +5038,8 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
gen_op_movl_seg_T0_vm(R_CS);
gen_op_jmp_v(cpu_T1);
}
- gen_eob(s);
+ tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+ gen_jr(s, cpu_tmp4);
break;
case 6: /* push Ev */
gen_push_v(s, cpu_T0);
@@ -6417,7 +6419,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
/* Note that gen_pop_T0 uses a zero-extending load. */
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 0xc3: /* ret */
ot = gen_pop_T0(s);
@@ -6425,7 +6427,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
/* Note that gen_pop_T0 uses a zero-extending load. */
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 0xca: /* lret im */
val = cpu_ldsw_code(env, s->pc);
--
2.7.4
- [Qemu-arm] [PATCH v4 04/11] tcg: export tcg_gen_lookup_and_goto_ptr, (continued)
- [Qemu-arm] [PATCH v4 04/11] tcg: export tcg_gen_lookup_and_goto_ptr, Emilio G. Cota, 2017/04/26
- [Qemu-arm] [PATCH v4 02/11] tcg-runtime: add lookup_tb_ptr helper, Emilio G. Cota, 2017/04/26
- [Qemu-arm] [PATCH v4 03/11] tcg: introduce goto_ptr opcode, Emilio G. Cota, 2017/04/26
- [Qemu-arm] [PATCH v4 05/11] tcg/i386: implement goto_ptr op, Emilio G. Cota, 2017/04/26
- [Qemu-arm] [PATCH v4 07/11] target/arm: optimize indirect branches, Emilio G. Cota, 2017/04/26
- Re: [Qemu-arm] [PATCH v4 07/11] target/arm: optimize indirect branches, Alex Bennée, 2017/04/27
- [Qemu-arm] [PATCH v4 10/11] target/i386: optimize indirect branches,
Emilio G. Cota <=
- Re: [Qemu-arm] [PATCH v4 00/11] TCG optimizations for 2.10, Emilio G. Cota, 2017/04/26
- Re: [Qemu-arm] [PATCH v4 00/11] TCG optimizations for 2.10, Aurelien Jarno, 2017/04/27