Re: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetre

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetre

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetree
Date:	Sat, 8 Jun 2019 09:14:59 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0

On 6/6/19 12:45 PM, Peter Maydell wrote:
> Convert the VFP VMLA instruction to decodetree.
> 
> This is the first of the VFP 3-operand data processing instructions,
> so we include in this patch the code which loops over the elements
> for an old-style VFP vector operation. The existing code to do this
> looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
> we are going to be converting instructions one at a time anyway
> we can take the opportunity to make the new loop use TCG temporaries,
> which means we can do that conversion one operation at a time
> rather than needing to do it all in one go.
> 
> We include an UNDEF check which was missing in the old code:
> short-vector operations (with stride or length non-zero) were
> deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
> field does not indicate that support for short vectors is present
> we UNDEF the operations that would use them. (This is a change
> of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
> previously were all incorrectly allowing short-vector operations.)
> 
> Note that the conversion fixes a bug in the old code for the
> case of VFP short-vector "mixed scalar/vector operations". These
> happen where the destination register is in a vector bank but
> but the second operand is in a scalar bank. For example
>   vmla.f64 d10, d1, d16   with length 2 stride 2
> is equivalent to the pair of scalar operations
>   vmla.f64 d10, d1, d16
>   vmla.f64 d8, d3, d16
> where the destination and first input register cycle through
> their vector but the second input is scalar (d16). In the
> old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
> as a temporary output for the multiply, which trashes the
> second input operand. For the fully-scalar case (where we
> never do a second iteration) and the fully-vector case
> (where the loop loads the new second input operand) this
> doesn't matter, but for the mixed scalar/vector case we
> will end up using the wrong value for later loop iterations.
> In the new code we use TCG temporaries and so avoid the bug.
> This bug is present for all the multiply-accumulate insns
> that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.
> 
> Note 2: the expression used to calculate the next register
> number in the vector bank is not in fact correct; we leave
> this behaviour unchanged from the old decoder and will
> fix this bug later in the series.
> 
> Signed-off-by: Peter Maydell <address@hidden>
> ---
>  target/arm/cpu.h               |   5 +
>  target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c         |  14 ++-
>  target/arm/vfp.decode          |   6 +
>  4 files changed, 224 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <address@hidden>


r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 39/42] target/arm: Convert VJCVT to decodetree, (continued)
- [Qemu-devel] [PATCH 32/42] target/arm: Convert VMOV (register) to decodetree, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 32/42] target/arm: Convert VMOV (register) to decodetree, Richard Henderson, 2019/06/08
- [Qemu-devel] [PATCH 35/42] target/arm: Convert the VCVT-to-f16 insns to decodetree, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 35/42] target/arm: Convert the VCVT-to-f16 insns to decodetree, Richard Henderson, 2019/06/08
- [Qemu-devel] [PATCH 27/42] target/arm: Convert VFP fused multiply-add insns to decodetree, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 27/42] target/arm: Convert VFP fused multiply-add insns to decodetree, Richard Henderson, 2019/06/08
- [Qemu-devel] [PATCH 13/42] target/arm: Convert "single-precision" register moves to decodetree, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 13/42] target/arm: Convert "single-precision" register moves to decodetree, Richard Henderson, 2019/06/07
- [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetree, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetree, Richard Henderson <=
- [Qemu-devel] [PATCH 28/42] target/arm: Convert VMOV (imm) to decodetree, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 28/42] target/arm: Convert VMOV (imm) to decodetree, Richard Henderson, 2019/06/08
    - Re: [Qemu-devel] [PATCH 28/42] target/arm: Convert VMOV (imm) to decodetree, Peter Maydell, 2019/06/10
    - Re: [Qemu-devel] [PATCH 28/42] target/arm: Convert VMOV (imm) to decodetree, Richard Henderson, 2019/06/10
    - Re: [Qemu-devel] [Qemu-arm] [PATCH 28/42] target/arm: Convert VMOV (imm) to decodetree, Ali Mezgani, 2019/06/10
- [Qemu-devel] [PATCH 41/42] target/arm: Convert float-to-integer VCVT insns to decodetree, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 41/42] target/arm: Convert float-to-integer VCVT insns to decodetree, Richard Henderson, 2019/06/08
- [Qemu-devel] [PATCH 42/42] target/arm: Fix short-vector increment behaviour, Peter Maydell, 2019/06/06
  - Re: [Qemu-devel] [PATCH 42/42] target/arm: Fix short-vector increment behaviour, Richard Henderson, 2019/06/08

Prev by Date: Re: [Qemu-devel] [PATCH 17/42] target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
Next by Date: Re: [Qemu-devel] [PATCH v17 10/10] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
Previous by thread: [Qemu-devel] [PATCH 18/42] target/arm: Convert VFP VMLA to decodetree
Next by thread: [Qemu-devel] [PATCH 28/42] target/arm: Convert VMOV (imm) to decodetree
Index(es):
- Date
- Thread