qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality fo


From: Yeongkyoon Lee
Subject: [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs
Date: Thu, 05 Jul 2012 22:23:35 +0900

Hi, all.

I think the generated codes from qemu_ld/st IRs are relatively heavy, which are 
up to 12 instructions for TLB hit case on i386 host.
This patch series enhances the code quality of TCG qemu_ld/st IRs by reducing 
jump and enhancing locality.
Main idea is simple and has been already described in the comments in 
tcg-target.c, which separates slow path (TLB miss case), and generates it at 
the end of TB.

For example, the generated code from qemu_ld changes as follow.
Before:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (5)
(3) TLB hit case: Load value from host memory
(4) Jump to next code (6)
(5) TLB miss case: call MMU helper
(6) ... (next code)

After:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (7)
(3) TLB hit case: Load value from host memory
(4) ... (next code)
...
(7) TLB miss case: call MMU helper
(8) Return to next code (4)

Following is some performance results which was measured based on qemu 1.0.
Although there was measurement error, the results was not negligible.

* EEMBC CoreMark (before -> after)
  - Guest: i386, Linux (Tizen platform)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results: 1135.6 -> 1179.9 (+3.9%)

* nbench (before -> after)
  - Guest: i386, Linux (linux-0.2.img included in QEMU source)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results
    . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
    . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
    . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)

Summarized feature is as following.
 - All the changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and 
disabled by default.
 - They are enabled by "configure --enable-ldst-optimization" and need 
CONFIG_SOFTMMU.
 - They do not work with CONFIG_TCG_PASS_AREG0 because it looks better apply 
them after areg0 codes come steady.
 - Currently, they support only x86 and x86-64 and have been tested with x86 
and ARM linux targets on x86/x86-64 host platforms.
 - Build test has been done for all targets.

In addition, I have tried to remove the generated codes of calling MMU helpers 
for TLB miss case from end of TB, however, have not found good solution yet. In 
my opinion, TLB hit case performance could be degraded if removing the calling 
codes, because it needs to set runtime parameters, such as, data, mmu index and 
return address, in register or stack though they are not used in TLB hit case. 
This remains as a further issue.

Yeongkyoon Lee (4):
  tcg: add declarations and templates of extended MMU helpers
  tcg: add extended MMU helpers to softmmu targets
  tcg: add optimized TCG qemu_ld/st generation
  configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
    optimization

 configure                     |   15 ++
 softmmu_defs.h                |   13 ++
 softmmu_template.h            |   51 +++++--
 target-alpha/mem_helper.c     |   22 +++
 target-arm/op_helper.c        |   23 +++
 target-cris/op_helper.c       |   22 +++
 target-i386/mem_helper.c      |   22 +++
 target-lm32/op_helper.c       |   23 +++-
 target-m68k/op_helper.c       |   22 +++
 target-microblaze/op_helper.c |   22 +++
 target-mips/op_helper.c       |   22 +++
 target-ppc/mem_helper.c       |   22 +++
 target-s390x/op_helper.c      |   22 +++
 target-sh4/op_helper.c        |   22 +++
 target-sparc/ldst_helper.c    |   23 +++
 target-xtensa/op_helper.c     |   22 +++
 tcg/i386/tcg-target.c         |  328 +++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c                     |   12 ++
 tcg/tcg.h                     |   35 +++++
 19 files changed, 732 insertions(+), 11 deletions(-)

-- 
1.7.4.1



reply via email to

[Prev in Thread] Current Thread [Next in Thread]