[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementatio
From: |
Nikunj A Dadhania |
Subject: |
Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation |
Date: |
Tue, 20 Sep 2016 22:40:03 +0530 |
User-agent: |
Notmuch/0.21 (https://notmuchmail.org) Emacs/25.0.94.1 (x86_64-redhat-linux-gnu) |
David Gibson <address@hidden> writes:
> [ Unknown signature status ]
> On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
>> David Gibson <address@hidden> writes:
>> > [ Unknown signature status ]
>> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
>> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c
>> >> > b/target-ppc/translate/vsx-impl.inc.c
>> >> > index eee6052..df278df 100644
>> >> > --- a/target-ppc/translate/vsx-impl.inc.c
>> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
>> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>> >> > static void gen_lxvw4x(DisasContext *ctx)
>> >> > {
>> >> > TCGv EA;
>> >> > - TCGv_i64 tmp;
>> >> > TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> >> > TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> >> > if (unlikely(!ctx->vsx_enabled)) {
>> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>> >> > }
>> >> > gen_set_access_type(ctx, ACCESS_INT);
>> >> > EA = tcg_temp_new();
>> >> > - tmp = tcg_temp_new_i64();
>> >> >
>> >> > gen_addr_reg_index(ctx, EA);
>> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> > - tcg_gen_addi_tl(EA, EA, 4);
>> >> > - gen_qemu_ld32u_i64(ctx, xth, EA);
>> >> > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> >> > -
>> >> > - tcg_gen_addi_tl(EA, EA, 4);
>> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> > - tcg_gen_addi_tl(EA, EA, 4);
>> >> > - gen_qemu_ld32u_i64(ctx, xtl, EA);
>> >> > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> >> > -
>> >> > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> >> > + gen_helper_deposit32x2(xth, xth);
>> >> > + tcg_gen_addi_tl(EA, EA, 8);
>> >> > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> >> > + gen_helper_deposit32x2(xtl, xtl);
>> >
>> > ..and I think this is wrong for BE mode. The deposit32x2 will get the
>> > words in the right order, but the bytes within each word will be wrong
>> > because of the LE mode load on a BE setup.
>>
>> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
>> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
>> The order within the word is not changed. Snippet of the test code at
>> the end of email. Can share full code if needed (maybe will do it in
>> kvm-unit-test)
>
> Ugh.. now I'm confused. I would not have expected the results you've
> seen from these tests. But I still can't understand *how* the
> emulation could be correct: IIUC MO_LEQ would mean it loads the 8
> bytes as a single 64-bit LE integer.
For both the case LE/BE we do a LE read ...
> Which should be the same as
> loading one 32-bit LE integer into the low half of the target
> register, then a 32-bit LE integer into the high half ot the target
> register.
.. The 64-bit integer read is not same in these cases. The input itself
would be in the order of the format.
Input rb32[]: 00010203 20212223 30313233 40414243
LE:
helper_deposit32x2: 2021222300010203
helper_deposit32x2: 4041424330313233
BE
helper_deposit32x2: 2322212003020100
helper_deposit32x2: 4342414033323130
>
> As I said above, the deposit32x2 will swap the order of the two ints,
> but it won't byteswap the individual int32s which should have been BE
> in memory.
>
> Can you find the flaw in my reasoning?
One anomaly that I see in BE code generation: it also generates a
stxvw4x after lxvw4x. I am not sure why.
>>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
Input rb32[]: 00010203 20212223 30313233 40414243
gen_lxvw4x: called
helper_deposit32x2: 2322212003020100
helper_deposit32x2: 4342414033323130
gen_stxvw4x: called
helper_deposit32x2: 0302010023222120
helper_deposit32x2: 3332313043424140
Output VRT32: 00010203 20212223 30313233 40414243
>> vsx.h:
>> ======
>> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
>>
>> typedef union {
>> __vector uint32_t v;
>> uint32_t a[U32_SIZE];
>> } vuint32_t;
>
> I am a little suspicious that whatever the compiler does to convert
> the vector to an array via this union might be undoing a byte reverse.
>
> I'd be more confident if you used VSX instructions to extract and
> store separately one of the 32-bit subwords of the vector.
I will try to figure those instructions.
Regards
Nikunj
- [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending), Nikunj A Dadhania, 2016/09/16
- [Qemu-devel] [PATCH v3 1/5] target-ppc: implement darn instruction, Nikunj A Dadhania, 2016/09/16
- [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/16
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/19
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/19
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/19
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/20
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation,
Nikunj A Dadhania <=
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/20
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/20
- Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/19
[Qemu-devel] [PATCH v3 3/5] target-ppc: improve stxvw4x implementation, Nikunj A Dadhania, 2016/09/16
[Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x, Nikunj A Dadhania, 2016/09/16
[Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x, Nikunj A Dadhania, 2016/09/16
Re: [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending), David Gibson, 2016/09/19