qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 0/3] tcg: enhance code generation quality for


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH v5 0/3] tcg: enhance code generation quality for qemu_ld/st IRs
Date: Tue, 9 Oct 2012 18:19:56 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Oct 09, 2012 at 04:26:10PM +0200, Aurelien Jarno wrote:
> On Tue, Oct 09, 2012 at 09:37:29PM +0900, Yeongkyoon Lee wrote:
> > Hi, all.
> > 
> > Here is the 5th version of the series optimizing TCG qemu_ld/st code 
> > generation.
> > 
> > v5:
> >   - Remove RFC tag
> > 
> > v4:
> >   - Remove CONFIG_SOFTMMU pre-condition from configure
> >   - Instead, add some CONFIG_SOFTMMU condition to TCG sources
> >   - Remove some unnecessary comments
> > 
> > v3:
> >   - Support CONFIG_TCG_PASS_AREG0
> >     (expected to get more performance enhancement than others)
> >   - Remove the configure option "--enable-ldst-optimization""
> >   - Make the optimization as default on i386 and x86_64 hosts
> >   - Fix some mistyping and apply checkpatch.pl before committing
> >   - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts
> >   - Test linux-user-test-0.3
> > 
> > v2:
> >   - Follow the submit rule of qemu
> > 
> > v1:
> >   - Initial commit request
> > 
> > I think the generated codes from qemu_ld/st IRs are relatively heavy, which 
> > are
> > up to 12 instructions for TLB hit case on i386 host.
> > This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing
> > jump and enhancing locality.
> > Main idea is simple and has been already described in the comments in
> > tcg-target.c, which separates slow path (TLB miss case), and generates it 
> > at the
> > end of TB.
> > 
> > For example, the generated code from qemu_ld changes as follow.
> > Before:
> > (1) TLB check
> > (2) If hit fall through, else jump to TLB miss case (5)
> > (3) TLB hit case: Load value from host memory
> > (4) Jump to next code (6)
> > (5) TLB miss case: call MMU helper
> > (6) ... (next code)
> > 
> > After:
> > (1) TLB check
> > (2) If hit fall through, else jump to TLB miss case (7)
> > (3) TLB hit case: Load value from host memory
> > (4) ... (next code)
> > ...
> > (7) TLB miss case: call MMU helper
> > (8) Return to next code (4)
> > 
> 
> Instead of calling the MMU helper with an additional argument (7), and
> then jump back (8) to the next code (4), what about pushing the address
> of the next code (4) on the stack and use a jmp instead of the call. In
> that case you don't need the extra argument to the helpers.
> 

Maybe it wasn't very clear. This is based on the fact that call is
basically push %rip + jmp. Therefore we can fake the return address by
putting the value we want, here the address of the next code. This mean
that we don't need to pass the extra argument to the helper for the 
return address, as GET_PC() would work correctly (it basically reads the
return address on the stack).

For other architectures, it might not be a push, but rather a move to
link register, basically put the return address where the calling
convention asks for.

OTOH I just realized it only works if the end of the slow path (moving
the value from the return address to the correct register). It might be
something doable.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]