Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generat

From:	Yeongkyoon Lee
Subject:	Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
Date:	Sun, 29 Jul 2012 00:39:01 +0900
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 2012년 07월 25일 23:00, Richard Henderson wrote:

On 07/25/2012 12:35 AM, Yeongkyoon Lee wrote:

+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU)
+/* Macros/structures for qemu_ld/st IR code optimization:
+   TCG_MAX_HELPER_LABELS is defined as same as OPC_BUF_SIZE in exec-all.h. */
+#define TCG_MAX_QEMU_LDST       640

Why statically size this ...

This just followed the other TCG's code style, the allocation of the"labels" of "TCGContext" in tcg.c.

+    /* labels info for qemu_ld/st IRs
+       The labels help to generate TLB miss case codes at the end of TB */
+    TCGLabelQemuLdst *qemu_ldst_labels;

... and then allocate the array dynamically?


ditto.

+    /* jne slow_path */
+    /* XXX: How to avoid using OPC_JCC_long for peephole optimization? */
+    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);

You can't, not and maintain the code-generate-until-address-reached
exception invariant.

+#ifndef CONFIG_QEMU_LDST_OPTIMIZATION
  uint8_t __ldb_mmu(target_ulong addr, int mmu_idx);
  void __stb_mmu(target_ulong addr, uint8_t val, int mmu_idx);
  uint16_t __ldw_mmu(target_ulong addr, int mmu_idx);
@@ -28,6 +30,30 @@ void __stl_cmmu(target_ulong addr, uint32_t val, int 
mmu_idx);
  uint64_t __ldq_cmmu(target_ulong addr, int mmu_idx);
  void __stq_cmmu(target_ulong addr, uint64_t val, int mmu_idx);
  #else
+/* Extended versions of MMU helpers for qemu_ld/st optimization.
+   The additional argument is a host code address accessing guest memory */
+uint8_t ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);

Don't tie LDST_OPTIMIZATION directly to the extended function calls.

For a host supporting predication, like ARM, the best code sequence
may look like

        (1) TLB check
        (2) If hit, load value from memory
        (3) If miss, call miss case (5)
        (4) ... next code
        ...
        (5) Load call parameters
        (6) Tail call (aka jump) to MMU helper

so that (a) we need not explicitly load the address of (3) by hand
for your RA parameter and (b) the mmu helper returns directly to (4).


r~

The difference between current HEAD and the code sequence you said is, Ithink, code locality.My LDST_OPTIMIZATION patches enhances the code locality and also removesone jump.It shows about 4% rising of CoreMark performance on x86 host whichsupports predication like ARM.

Probably, the performance enhancement for AREG0 cases might get more larger.

I'm not sure where the performance enhancement came from now, and I'llcheck it by some tests later.

In my humble opinion, there are no things to lose in LDST_OPTIMIZATIONexceptfor just adding one argument to MMU helper implicitly which doesn't lookso critical.

How about your opinion?

Thanks.

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC][PATCH v4 0/3] tcg: enhance code generation quality for qemu_ld/st IRs, Yeongkyoon Lee, 2012/07/25
- [Qemu-devel] [RFC][PATCH v4 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization, Yeongkyoon Lee, 2012/07/25
- [Qemu-devel] [RFC][PATCH v4 2/3] tcg: Add declarations and templates of extended MMU helpers, Yeongkyoon Lee, 2012/07/25
- [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block, Yeongkyoon Lee, 2012/07/25
  - Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block, Richard Henderson, 2012/07/25
    - Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block, Yeongkyoon Lee <=

Prev by Date: Re: [Qemu-devel] [BUG] BSOD on Win2003 Server when 64bit PCI resource is present
Next by Date: Re: [Qemu-devel] [SeaBIOS] [BUG] BSOD on Win2003 Server when 64bit PCI resource is present
Previous by thread: Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
Next by thread: [Qemu-devel] [PATCH V2 0/3] Show backing file ancestors count in HMP
Index(es):
- Date
- Thread