[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v2 12/13] target/i386: optimize indirect branches
From: |
Emilio G. Cota |
Subject: |
[Qemu-devel] [PATCH v2 12/13] target/i386: optimize indirect branches |
Date: |
Tue, 25 Apr 2017 03:53:58 -0400 |
The appended minimizes exits to the exec loop for indirect branches.
By using the gen_jr helper, we can remain in TCG mode as long as
the indirect branch target is found in tb_jmp_cache.
This should improve performance for workloads that have a high
hit rate in tb_jmp_cache.
Softmmu Measurements: (see user-mode measurements in later commit)
Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
- SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest).
Host: Intel i7-4790K @ 4.00GHz
2.2x
+-+--------------------------------------------------------------------------------------------------------------+-+
|
+++ |
| cross+inline
| |
2x
+cross+jr+inline................................................................+++.|............................+-+
|
| | |
|
| | |
|
| | |
1.8x
+-+..............................................................................|..|............................+-+
|
|#### |
|
|# |# |
1.6x
+-+............................................................................****.|#...........................+-+
|
* |* |# |
|
* |* |# |
|
* |* |# |
1.4x
+-+.......................................................................+++..*.|*.|#...........................+-+
| +++
| * |*++# +++ |
| +++ |
#### * |* # +++ | |
1.2x
+-+......................###.............+++............|.+++.............#++#.*++*..#...........|..|............+-+
| +++# # +++ | | |
++# # * * # +++ ****## #### |
| ++#### **** # +++#### #### *** |
**** # * * # ++#### *| *|# ****++# |
| ****++# ++#### * * # **** # ++#| # ++#### *|*### ****## *
* # * * # *** |# *++*+# *++* # |
1x
+-++-*++*++#++***+-#++*++*+#++*+-*++#+****++#++***++#+-*+*++#-+*++*+#++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+
| * * # * * # * * # * * # *++* # * * # *+* |# * * # *
* # * * # * * # * * # * * # |
| * * # * * # * * # * * # * * # * * # * *++# * * # *
* # * * # * * # * * # * * # |
0.8x
+-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf
omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/aSXm0qh
NB. 'cross' represents the previous commit.
Signed-off-by: Emilio G. Cota <address@hidden>
---
target/i386/translate.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 9982a2d..0b4e1e1 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4991,7 +4991,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
gen_push_v(s, cpu_T1);
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 3: /* lcall Ev */
gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5009,7 +5009,8 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
tcg_const_i32(dflag - 1),
tcg_const_i32(s->pc - s->cs_base));
}
- gen_eob(s);
+ tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+ gen_jr(s, cpu_tmp4);
break;
case 4: /* jmp Ev */
if (dflag == MO_16) {
@@ -5017,7 +5018,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
}
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 5: /* ljmp Ev */
gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5032,7 +5033,8 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
gen_op_movl_seg_T0_vm(R_CS);
gen_op_jmp_v(cpu_T1);
}
- gen_eob(s);
+ tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+ gen_jr(s, cpu_tmp4);
break;
case 6: /* push Ev */
gen_push_v(s, cpu_T0);
@@ -6412,7 +6414,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
/* Note that gen_pop_T0 uses a zero-extending load. */
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 0xc3: /* ret */
ot = gen_pop_T0(s);
@@ -6420,7 +6422,7 @@ static target_ulong disas_insn(CPUX86State *env,
DisasContext *s,
/* Note that gen_pop_T0 uses a zero-extending load. */
gen_op_jmp_v(cpu_T0);
gen_bnd_jmp(s);
- gen_eob(s);
+ gen_jr(s, cpu_T0);
break;
case 0xca: /* lret im */
val = cpu_ldsw_code(env, s->pc);
--
2.7.4
- Re: [Qemu-devel] [PATCH v2 07/13] tcg/i386: implement goto_ptr op, (continued)
- [Qemu-devel] [PATCH v2 02/13] exec-all: inline tb_from_jmp_cache, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 08/13] target/arm: optimize cross-page block chaining in softmmu, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 11/13] target/i386: optimize cross-page direct jumps in softmmu, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 05/13] tcg-runtime: add lookup_tb_ptr helper, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 10/13] target/i386: introduce gen_jr() helper to jump to register, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 12/13] target/i386: optimize indirect branches,
Emilio G. Cota <=
- [Qemu-devel] [PATCH v2 06/13] tcg: add goto_ptr opcode, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 09/13] target/arm: optimize indirect branches with TCG's goto_ptr, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 04/13] tcg: keep TCGContext's read-mostly fields in a separate cache line, Emilio G. Cota, 2017/04/25
- [Qemu-devel] [PATCH v2 13/13] tb-hash: improve tb_jmp_cache hash function in user mode, Emilio G. Cota, 2017/04/25