[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improve
From: |
Richard Henderson |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements |
Date: |
Tue, 18 Dec 2018 08:16:49 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
On 12/18/18 7:26 AM, Mark Cave-Ayland wrote:
> That seems wrong to me. Given that the ppc_avr_t is a union then I'd expect
> it to be
> in host order? Certainly in the VMX helper macros I've looked at, the members
> are set
> directly with no byte swapping.
"Host order"? For both words of the vector?
That's certainly going to cause problems wrt VSX and FPU registers. We're
hard-coding that as fpu == vsx.u64[0] (both before and after your patch set).
For vscr, on master we have
void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r)
{
#if defined(HOST_WORDS_BIGENDIAN)
env->vscr = r->u32[3];
#else
env->vscr = r->u32[0];
#endif
and
if (needs_byteswap) {
vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]);
vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]);
} else {
vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0];
vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1];
}
}
vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);
For helper macros that apply the same operation to all lanes, it doesn't matter
which order in which the lanes are processed, so of course I would expect them
to be processed in host order.
It's cases that do not apply the same operation, such as merges, where the
problems would arise.
There are at least 3 schemes being employed to address this:
#if defined(HOST_WORDS_BIGENDIAN)
#define HI_IDX 0
#define LO_IDX 1
#define AVRB(i) u8[i]
#define AVRW(i) u32[i]
#else
#define HI_IDX 1
#define LO_IDX 0
#define AVRB(i) u8[15-(i)]
#define AVRW(i) u32[3-(i)]
#endif
...
#if defined(HOST_WORDS_BIGENDIAN)
#define EL_IDX(i) (i)
#else
#define EL_IDX(i) (3 - (i))
#endif
...
#define EL_IDX(i) (i)
#else
#define EL_IDX(i) (1 - (i))
#endif
...
#if defined(HOST_WORDS_BIGENDIAN)
result.u8[i] = a->u8[indexA] ^ b->u8[indexB];
#else
result.u8[i] = a->u8[15-indexA] ^ b->u8[15-indexB];
#endif
r~
- Re: [Qemu-ppc] [PATCH 34/34] target/ppc: convert vmin* and vmax* to vector operations, (continued)
- [Qemu-ppc] [PATCH 13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access, Richard Henderson, 2018/12/18
- Re: [Qemu-ppc] [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements, Mark Cave-Ayland, 2018/12/18
- Re: [Qemu-ppc] [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements, Mark Cave-Ayland, 2018/12/18