Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.
Date:	Mon, 19 Apr 2010 08:56:44 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Thunderbird/3.0.4

On 04/18/2010 05:13 PM, Aurelien Jarno wrote:
> On Tue, Apr 13, 2010 at 04:33:59PM -0700, Richard Henderson wrote:
>> Define OPC_BSWAP.  Factor opcode emission to separate functions.
>> Use bswap+shift to implement 16-bit swap instead of a rolw; this
>> gets the proper zero-extension required by INDEX_op_bswap16_i32.
> 
> This is not required by INDEX_op_bswap16_i32. What is need is that the
> value in the input register has the 16 upper bits set to 0.

Ah.

> Considering
> that, the rolw instruction is faster than bswap + shift.

Well, no, it isn't.

 static inline int test_rolw(unsigned short *s)
 {
   int i, start, end;
   asm volatile("rdtsc\n\t"
                "movl %%eax, %1\n\t"
                "movzwl %3,%2\n\t"
                "rolw $8, %w2\n\t"
                "addl $1,%2\n\t"
                "rdtsc"
                : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
   return end - start;
 }
 
 static inline int test_bswap(unsigned short *s)
 {
   int i, start, end;
   asm volatile("rdtsc\n\t"
                "movl %%eax, %1\n\t"
                "movzwl %3,%2\n\t"
                "bswap %2\n\t"
                "shl $16,%2\n\t"
                "addl $1,%2\n\t"
                "rdtsc"
                : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
   return end - start;
 }


model name      : Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
 rolw      60   60   72   60   60   72   60   60   72   60
 bswap     60   60   60   60   60   60   60   60   60   60

model name      : Dual-Core AMD Opteron(tm) Processor 1210
 rolw       9   10    9    9    8    8    8    8    8    8
 bswap      9    9    8    8    8    8    8    8    8    8

The rolw sequence isn't ever faster, and it's more unstable,
likely due to the partial register stall I mentioned.

I will grant that the rolw sequence is smaller, and I can 
adjust this patch to use that sequence if you wish.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 00/21] tcg-i386 cleanup and improvement, Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 01/21] tcg-i386: Allocate call-saved registers first., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 03/21] tcg-i386: Tidy ext8u and ext16u operations., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 02/21] tcg-i386: Tidy initialization of tcg_target_call_clobber_regs., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 04/21] tcg-i386: Tidy ext8s and ext16s operations., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations., Richard Henderson, 2010/04/14
  - Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations., Aurelien Jarno, 2010/04/18
    - Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations., Richard Henderson <=
    - Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations., malc, 2010/04/19
    - Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations., Richard Henderson, 2010/04/19
- [Qemu-devel] [PATCH 10/21] tcg-i386: Tidy immediate arithmetic operations., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 06/21] tcg-i386: Tidy shift operations., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 11/21] tcg-i386: Tidy non-immediate arithmetic operations., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 09/21] tcg-i386: Tidy jumps., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 08/21] tcg-i386: Eliminate extra move from qemu_ld64., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 07/21] tcg-i386: Tidy move operations., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 13/21] tcg-i386: Tidy push/pop., Richard Henderson, 2010/04/14
- [Qemu-devel] [PATCH 12/21] tcg-i386: Tidy movi., Richard Henderson, 2010/04/14

Prev by Date: Re: [Qemu-devel] [PATCH v2 2/3] vmdk: Clean up backing file handling
Next by Date: [Qemu-devel] Re: [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls
Previous by thread: Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.
Next by thread: Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.
Index(es):
- Date
- Thread