qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Huge TLB performance improvement


From: Daniel Jacobowitz
Subject: Re: [Qemu-devel] [PATCH] Huge TLB performance improvement
Date: Sat, 11 Nov 2006 20:10:35 -0500
User-agent: Mutt/1.5.13 (2006-08-11)

On Sun, Nov 05, 2006 at 10:38:20AM -0500, Daniel Jacobowitz wrote:
> On Mon, Mar 06, 2006 at 02:59:29PM +0000, Thiemo Seufer wrote:
> > Hello All,
> > 
> > this patch vastly improves TLB performance on MIPS, and probably also
> > on other architectures. I measured a Linux boot-shutdown cycle,
> > including userland init.
> 
> Quoting the whole message since this is from March...
> 
> I don't remember seeing any followup discussion of this patch, but I
> may have missed it.  Thiemo's definitely right about "vastly".  Is this
> patch appropriate, or would anyone care to suggest a more
> sophisticated data structure to avoid the full cache invalidate?

This patch is an even nicer alternative, I think.  I benchmarked four
alternatives (several times each):

Straight qemu with my previously posted MIPS patches takes 6:13 to
start and reboot a MIPS userspace (through init, so lots of fork/exec).

Thiemo's patch, which flushes the whole jump buffer, cuts it to 1:40.

A patch which finds the entries which need to be flushed more
efficiently cuts it to 1:21.

A patch which flushes up to 1/32nd of the jump buffer indiscriminately
cuts it to 1:11-1:13.

Here's that last patch.  It changes the hash function so that entries
from a particular page are always grouped together in tb_jmp_cache,
then finds the possibly two affected ranges and memsets them clear.
Thoughts?  Is this acceptable, where else should it be tested besides
MIPS?  I haven't fine-tuned the numbers; it currently allows for max 64
cached jump targets per target page, but that could be made higher or
lower.

-- 
Daniel Jacobowitz
CodeSourcery

---
 cpu-defs.h |    5 +++++
 exec-all.h |   12 +++++++++++-
 exec.c     |   15 +++++++--------
 3 files changed, 23 insertions(+), 9 deletions(-)

Index: qemu/cpu-defs.h
===================================================================
--- qemu.orig/cpu-defs.h        2006-11-11 15:12:26.000000000 -0500
+++ qemu/cpu-defs.h     2006-11-11 15:12:33.000000000 -0500
@@ -80,6 +80,11 @@ typedef unsigned long ram_addr_t;
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
+#define TB_JMP_PAGE_BITS (TB_JMP_CACHE_BITS / 2)
+#define TB_JMP_PAGE_SIZE (1 << TB_JMP_PAGE_BITS)
+#define TB_JMP_ADDR_MASK (TB_JMP_PAGE_SIZE - 1)
+#define TB_JMP_PAGE_MASK (TB_JMP_ADDR_MASK << TB_JMP_PAGE_BITS)
+
 #define CPU_TLB_BITS 8
 #define CPU_TLB_SIZE (1 << CPU_TLB_BITS)
 
Index: qemu/exec-all.h
===================================================================
--- qemu.orig/exec-all.h        2006-11-11 15:12:26.000000000 -0500
+++ qemu/exec-all.h     2006-11-11 19:56:36.000000000 -0500
@@ -196,9 +196,19 @@ typedef struct TranslationBlock {
     struct TranslationBlock *jmp_first;
 } TranslationBlock;
 
+static inline unsigned int tb_jmp_cache_hash_page(target_ulong pc)
+{
+    target_ulong tmp;
+    tmp = pc ^ (pc >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS));
+    return (tmp >> TB_JMP_PAGE_BITS) & TB_JMP_PAGE_MASK;
+}
+
 static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
 {
-    return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
+    target_ulong tmp;
+    tmp = pc ^ (pc >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS));
+    return (((tmp >> TB_JMP_PAGE_BITS) & TB_JMP_PAGE_MASK) |
+           (tmp & TB_JMP_ADDR_MASK));
 }
 
 static inline unsigned int tb_phys_hash_func(unsigned long pc)
Index: qemu/exec.c
===================================================================
--- qemu.orig/exec.c    2006-11-11 15:12:26.000000000 -0500
+++ qemu/exec.c 2006-11-11 19:39:45.000000000 -0500
@@ -1299,14 +1299,13 @@ void tlb_flush_page(CPUState *env, targe
     tlb_flush_entry(&env->tlb_table[0][i], addr);
     tlb_flush_entry(&env->tlb_table[1][i], addr);
 
-    for(i = 0; i < TB_JMP_CACHE_SIZE; i++) {
-        tb = env->tb_jmp_cache[i];
-        if (tb && 
-            ((tb->pc & TARGET_PAGE_MASK) == addr ||
-             ((tb->pc + tb->size - 1) & TARGET_PAGE_MASK) == addr)) {
-            env->tb_jmp_cache[i] = NULL;
-        }
-    }
+    /* Discard jump cache entries for any tb which might potentially
+       overlap the flushed page.  */
+    i = tb_jmp_cache_hash_page(addr - TARGET_PAGE_SIZE);
+    memset (&env->tb_jmp_cache[i], 0, TB_JMP_PAGE_SIZE * sizeof(tb));
+
+    i = tb_jmp_cache_hash_page(addr);
+    memset (&env->tb_jmp_cache[i], 0, TB_JMP_PAGE_SIZE * sizeof(tb));
 
 #if !defined(CONFIG_SOFTMMU)
     if (addr < MMAP_AREA_END)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]