qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 15/15] tcg: use ext op for deposit


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH 15/15] tcg: use ext op for deposit
Date: Sun, 10 Apr 2011 22:28:17 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Sun, Apr 10, 2011 at 10:17:26PM +0200, Alexander Graf wrote:
> 
> On 10.04.2011, at 22:08, Aurelien Jarno wrote:
> 
> > On Sun, Apr 10, 2011 at 09:25:33PM +0200, Alexander Graf wrote:
> >> 
> >> On 10.04.2011, at 21:23, Aurelien Jarno wrote:
> >> 
> >>> On Tue, Apr 05, 2011 at 09:55:09AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 05.04.2011, at 06:54, Aurelien Jarno wrote:
> >>>> 
> >>>>> On Mon, Apr 04, 2011 at 04:32:24PM +0200, Alexander Graf wrote:
> >>>>>> With the s390x target we use the deposit instruction to store 32bit 
> >>>>>> values
> >>>>>> into 64bit registers without clobbering the upper 32 bits.
> >>>>>> 
> >>>>>> This specific operation can be optimized slightly by using the ext 
> >>>>>> operation
> >>>>>> instead of an explicit and in the deposit instruction. This patch adds 
> >>>>>> that
> >>>>>> special case to the generic deposit implementation.
> >>>>>> 
> >>>>>> Signed-off-by: Alexander Graf <address@hidden>
> >>>>>> ---
> >>>>>> tcg/tcg-op.h |    6 +++++-
> >>>>>> 1 files changed, 5 insertions(+), 1 deletions(-)
> >>>>> 
> >>>>> Have you really measuring a difference here? This should already be
> >>>>> handled, at least on x86, by this code:
> >>>>> 
> >>>>>      if (TCG_TARGET_REG_BITS == 64) {
> >>>>>          if (val == 0xffffffffu) {
> >>>>>              tcg_out_ext32u(s, r0, r0);
> >>>>>              return;
> >>>>>          }
> >>>>>          if (val == (uint32_t)val) {
> >>>>>              /* AND with no high bits set can use a 32-bit operation.  
> >>>>> */
> >>>>>              rexw = 0;
> >>>>>          }
> >>>>>      }
> >>>> 
> >>>> I've certainly looked at the -d op logs and seen that instead of 
> >>>> creating a const tcg variable plus an AND there was now an extu opcode 
> >>>> issued, yes. No idea why the case up there didn't trigger.
> >>>> 
> >>> 
> >>> The question there is looking at -d out_asm. They should be the same at
> >>> the end as the code I pasted above is from tcg/i386/tcg-target.c.
> >> 
> >> Yes. I was trying to optimize for maximum op length. TCG defines a maximum 
> >> number of tcg ops to be issued by each target instruction. Since s390 is 
> >> very CISCy, there are instructions that translate into lots of microops, 
> >> but are still faster than a C call (register save/restore mostly).
> >> 
> >> Without this patch, there are some places where we hit that number :).
> > 
> > Is it on 32-bit on or 64-bit? If we reach this number, it's probably
> > better to either implement this instruction with an helper, or maybe
> > increase the number of maximum ops. What is this instruction?
> 
> This was on x86_64. I hit limits with LMH and LM, but reduced them to fit 
> into the picture with this optimization :). If you like, I can give you a 
> statically linked binary that could exceed the limits.
> 

Yeah for what I see it's the loop is unrolled there. Not sure it is the
best to do. Also if the limit is exceeded on 64-bit it is for sure
exceeded on 32-bit hosts.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]