[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetre
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetree |
Date: |
Sat, 8 Jun 2019 09:14:59 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 |
On 6/6/19 12:45 PM, Peter Maydell wrote:
> Convert the VFP VMLA instruction to decodetree.
>
> This is the first of the VFP 3-operand data processing instructions,
> so we include in this patch the code which loops over the elements
> for an old-style VFP vector operation. The existing code to do this
> looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
> we are going to be converting instructions one at a time anyway
> we can take the opportunity to make the new loop use TCG temporaries,
> which means we can do that conversion one operation at a time
> rather than needing to do it all in one go.
>
> We include an UNDEF check which was missing in the old code:
> short-vector operations (with stride or length non-zero) were
> deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
> field does not indicate that support for short vectors is present
> we UNDEF the operations that would use them. (This is a change
> of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
> previously were all incorrectly allowing short-vector operations.)
>
> Note that the conversion fixes a bug in the old code for the
> case of VFP short-vector "mixed scalar/vector operations". These
> happen where the destination register is in a vector bank but
> but the second operand is in a scalar bank. For example
> vmla.f64 d10, d1, d16 with length 2 stride 2
> is equivalent to the pair of scalar operations
> vmla.f64 d10, d1, d16
> vmla.f64 d8, d3, d16
> where the destination and first input register cycle through
> their vector but the second input is scalar (d16). In the
> old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
> as a temporary output for the multiply, which trashes the
> second input operand. For the fully-scalar case (where we
> never do a second iteration) and the fully-vector case
> (where the loop loads the new second input operand) this
> doesn't matter, but for the mixed scalar/vector case we
> will end up using the wrong value for later loop iterations.
> In the new code we use TCG temporaries and so avoid the bug.
> This bug is present for all the multiply-accumulate insns
> that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.
>
> Note 2: the expression used to calculate the next register
> number in the vector bank is not in fact correct; we leave
> this behaviour unchanged from the old decoder and will
> fix this bug later in the series.
>
> Signed-off-by: Peter Maydell <address@hidden>
> ---
> target/arm/cpu.h | 5 +
> target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
> target/arm/translate.c | 14 ++-
> target/arm/vfp.decode | 6 +
> 4 files changed, 224 insertions(+), 6 deletions(-)
Reviewed-by: Richard Henderson <address@hidden>
r~
- Re: [Qemu-devel] [PATCH 39/42] target/arm: Convert VJCVT to decodetree, (continued)
- [Qemu-devel] [PATCH 32/42] target/arm: Convert VMOV (register) to decodetree, Peter Maydell, 2019/06/06
- [Qemu-devel] [PATCH 35/42] target/arm: Convert the VCVT-to-f16 insns to decodetree, Peter Maydell, 2019/06/06
- [Qemu-devel] [PATCH 27/42] target/arm: Convert VFP fused multiply-add insns to decodetree, Peter Maydell, 2019/06/06
- [Qemu-devel] [PATCH 13/42] target/arm: Convert "single-precision" register moves to decodetree, Peter Maydell, 2019/06/06
- [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetree, Peter Maydell, 2019/06/06
- Re: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetree,
Richard Henderson <=
- [Qemu-devel] [PATCH 28/42] target/arm: Convert VMOV (imm) to decodetree, Peter Maydell, 2019/06/06
[Qemu-devel] [PATCH 41/42] target/arm: Convert float-to-integer VCVT insns to decodetree, Peter Maydell, 2019/06/06
[Qemu-devel] [PATCH 42/42] target/arm: Fix short-vector increment behaviour, Peter Maydell, 2019/06/06