Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}

From:	Mark Cave-Ayland
Subject:	Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros
Date:	Tue, 29 Jan 2019 18:49:18 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

On 27/01/2019 18:07, Richard Henderson wrote:

> On 1/27/19 9:45 AM, Mark Cave-Ayland wrote:
>>> I would expect the i < n/2 loop to be faster, because the assignments are
>>> unconditional.  FWIW.
>>
>> Do you have any idea as to how much faster? Is it something that would show
>> up as significant within the context of QEMU?
> 
> I don't have any numbers on that, no.
> 
>> As well as eliminating the HI_IDX/LO_IDX constants I do find the updated
>> version much easier to read, so I would prefer to keep it if possible.
>> What about unrolling the loop into 2 separate ones...
> 
> I doubt that would be helpful.
> 
> I would think that
> 
> #define VMRG_DO(name, access, ofs)
> ...
>     int i, half = ARRAY_SIZE(r->access(0)) / 2;
> ...
>     for (i = 0; i < half; i++) {
>         result.access(2 * i + 0) = a->access(i + ofs);
>         result.access(2 * i + 1) = b->access(i + ofs);
>     }
> 
> where OFS = 0 for HI and half for LO is best.  I find it quite readable, and 
> it
> avoids duplicating code between LO and HI as you're currently doing.

Attached is my test program which benchmarks the different approaches across
0x8000000 iterations and gives the following sample output here on an i7 when
compiled with gcc -O2:

$ ./mcatest
Benchmark 1 - existing merge high
Elapsed time: 1434735 us

Benchmark 2 - v3 merge high
Elapsed time: 2603553 us

Benchmark 3 - 2 loops merge high
Elapsed time: 2395434 us

Benchmark 4 - Richard's merge high
Elapsed time: 1318369 us


These indicate that the proposed v3 merge algorithm is nearly 50% slower than 
the
existing implementation - it wasn't something noticeable during emulation, but 
in a
benchmark situation the additional overhead is clearly visible.

TLDR: after playing around with the different approaches, Richard's proposed
algorithm is the fastest, and is actually slightly quicker than the current
implementation. Please go to the foyer after class where you can collect your 
prize :)

On this basis I'll redo v3 using Richard's algorithm and post a v4 later 
assuming
that it passes my local tests again.


ATB,

Mark.

mcatest.c
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v3 1/8] target/ppc: implement complete set of Vsr* macros, (continued)
- [Qemu-devel] [PATCH v3 1/8] target/ppc: implement complete set of Vsr* macros, Mark Cave-Ayland, 2019/01/27
  - Re: [Qemu-devel] [PATCH v3 1/8] target/ppc: implement complete set of Vsr* macros, David Gibson, 2019/01/28
- [Qemu-devel] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, Mark Cave-Ayland, 2019/01/27
  - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, BALATON Zoltan, 2019/01/27
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, Mark Cave-Ayland, 2019/01/27
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, Richard Henderson, 2019/01/27
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, Mark Cave-Ayland, 2019/01/27
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, Richard Henderson, 2019/01/27
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, David Gibson, 2019/01/28
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, Mark Cave-Ayland, 2019/01/29
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, Mark Cave-Ayland <=
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, BALATON Zoltan, 2019/01/27
    - Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros, BALATON Zoltan, 2019/01/27
- [Qemu-devel] [PATCH v3 3/8] target/ppc: rework vmul{e, o}{s, u}{b, h, w} instructions to use Vsr* macros, Mark Cave-Ayland, 2019/01/27
- [Qemu-devel] [PATCH v3 5/8] target/ppc: eliminate use of EL_IDX macros from int_helper.c, Mark Cave-Ayland, 2019/01/27
- [Qemu-devel] [PATCH v3 4/8] target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c, Mark Cave-Ayland, 2019/01/27
  - Re: [Qemu-devel] [PATCH v3 4/8] target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c, Richard Henderson, 2019/01/27
- [Qemu-devel] [PATCH v3 6/8] target/ppc: simplify VEXT_SIGNED macro in int_helper.c, Mark Cave-Ayland, 2019/01/27
- [Qemu-devel] [PATCH v3 7/8] target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c, Mark Cave-Ayland, 2019/01/27
- [Qemu-devel] [PATCH v3 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c, Mark Cave-Ayland, 2019/01/27

Prev by Date: Re: [Qemu-devel] [PATCH v2 2/2] Acceptance tests: add simple migration test
Next by Date: Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
Previous by thread: Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros
Next by thread: Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros
Index(es):
- Date
- Thread