[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [patch] make qemu work with GCC 4

From: Michael Matz
Subject: Re: [Qemu-devel] [patch] make qemu work with GCC 4
Date: Wed, 29 Aug 2007 19:29:36 +0200 (CEST)


On Wed, 29 Aug 2007, Paul Brook wrote:

> >        I solved that by placing one of the T[012] operands into memory
> >        for HOST_I386, thereby freeing one reg.  Here's some justification
> >        of why that doesn't really cost performance: with three free regs
> >        GCC is already spilling like mad in the snippets, we just trade one
> >        of those memory accesses (to stack) with one other mem access to
> >        the cpu_state structure, which will be in cache.
> Do you have any evidence to support this claim?

Not really, only an apple and orange comparison.  A 10000 iteration 
tests/sha1 run in the same Linux image, with -no-kqemu, on host and target 
i386:  time ./sha1

with qemu-0.8.2 (compiled by gcc 3.3-hammer): 7.92 seconds
with qemu-0.9.0-cvs (gcc4.1 compiled, with the patch): 8.15 seconds

I'll try to get a better comparison.  Note, though, that this is the only 
easy solution.  Any other solution would either involve improving reload 
pretty much, or rewriting many of the op.c patterns (for all targets) to 
never require more than three registers, and ensure that gcc doesn't 
become clever and mashes some insns together again (and in trying to do 
that probably slow down the snippets again).  Or doing away with dyngen at 
all.  All three solutions are fairly involved, so I'm personally fine with 
the above slowdown (for i386 targets you won't normally get any noticable 
slowdown as you'd use kqemu).

> Last time I did this it caused a significant performance hit. I'd guess 
> that most common ops are simple enough that we don't need more than 3 
> registers.

For i386 target I'm redefining only T1 (as I figured that A0 and T0 are 
used a bit more), it might be that some of the code could be generated in 
a way to not make use of T1 too much, I haven't really poked at that.

> > --- qemu-0.9.0.cvs.orig/softmmu_header.h
> > -                  : "%eax", "%ecx", "%edx", "memory", "cc");
> > +                  : "%eax", "%edx", "memory", "cc");
> This change is wrong. The inline asm calls C code which clobbers %ecx.

Indeed, must have been blind.  Okay these are too many clobbers for poor 
gcc4 nevertheless, so a push/pop %ecx around the call needs to be done.  
Courious that this didn't lead to fire all over the place.  See patch 


Index: softmmu_header.h
RCS file: /sources/qemu/qemu/softmmu_header.h,v
retrieving revision 1.15
diff -u -p -r1.15 softmmu_header.h
--- softmmu_header.h    23 May 2007 19:58:10 -0000      1.15
+++ softmmu_header.h    29 Aug 2007 17:27:37 -0000
@@ -230,9 +230,11 @@ static inline void glue(glue(st, SUFFIX)
 #error unsupported size
+                 "pushl %%ecx\n"
                   "pushl %6\n"
                   "call %7\n"
                   "popl %%eax\n"
+                  "popl %%ecx\n"
                   "jmp 2f\n"
                   "addl 8(%%edx), %%eax\n"
@@ -250,14 +252,18 @@ static inline void glue(glue(st, SUFFIX)
                   : "r" (ptr), 
 /* NOTE: 'q' would be needed as constraint, but we could not use it
    with T1 ! */
+#if DATA_SIZE == 1 || DATA_SIZE == 2
+                 "q" (v),
                   "r" (v), 
                   "i" ((CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS), 
                   "i" (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS), 
                   "i" (TARGET_PAGE_MASK | (DATA_SIZE - 1)),
                   "m" (*(uint32_t *)offsetof(CPUState, 
                   "i" (CPU_MEM_INDEX),
                   "m" (*(uint8_t *)&glue(glue(__st, SUFFIX), MMUSUFFIX))
-                  : "%eax", "%ecx", "%edx", "memory", "cc");
+                  : "%eax", "%edx", "memory", "cc");

reply via email to

[Prev in Thread] Current Thread [Next in Thread]