[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v2 00/42] target/arm: Convert VFP decoder to decodet
From: |
Peter Maydell |
Subject: |
[Qemu-devel] [PATCH v2 00/42] target/arm: Convert VFP decoder to decodetree |
Date: |
Tue, 11 Jun 2019 11:53:09 +0100 |
This patchset converts the Arm VFP instructions to use decodetree
instead of the current hand-written decode.
v2 has only very minor changes since v1:
* patch 33 (VFP comparisons): added missing TCG frees
* patch 39 (VJCVT): add back missing jscvt feature check
Patch 39 is the only one still in need of review.
Rest of the cover letter from v1 below, for further context:
We gain:
* a more maintainable decoder which doesn't live in one big function
* correct prioritization of UNDEF exceptions against "VFP disabled"
exceptions and "M-profile lazy FP stacking" activity
* significant reduction in the use of the "cpu_F0[sd]" and "cpu_F1[sd]"
TCG globals. These are a relic of a much older translator and
eventually we should try to get rid of them entirely
* more accurate decode, UNDEFing some things we were incorrectly lax on
* a fixed bug for VFP short-vector mixed vector/scalar VMLA/VMLS/VNMLA/VNMLS
insns: we were incorrectly corrupting the scalar input operand
in the process of performing the multiply-accumulate, so every
element after the first was miscalculated
* a fixed bug in the calculation of the next register number to use
when VFP short-vector operations wrapped around the vector bank
* decode which checks ID registers for "do we have D16-D31" rather
than using "is this VFPv3" -- this means that Cortex-M4, -M33 and -R5F
all now correctly give the guest only 16 Dregs rather than 31.
(Note that the old decoder hides this UNDEF handling inside the
VFP_DREG macros...)
* the fused multiply-add insns now correctly UNDEF for attempts to
use them as short-vector operations
* short-vector functionality is only implemented if the ID registers
say it should be (which in practice means "only Cortex-A8 or earlier");
we continue to provide it in -cpu max for compatibility
* VRINTR, VRINTZ and VRINTX are only provided in v8A and above
* VFP related translation code split out into its own source file
* the "is this special register present and accessible" check is
now consistent between read and write
There is definitely scope for further cleanup:
* the translate-vfp.inc.c could be further isolated into its
own standalone .c file rather than being #included into translate.c
* cpu_F0* are still used in parts of the Neon decode (and the
iwmmxt code, alas)
* I noticed some places doing a load-and-shift or load-modify-store
sequence to update byte or halfword parts of float registers;
these could be rewritten to do direct byte or halfword loads/stores
* we could remove the remaining uses of tcg_gen_ld/st_f32()
(in the Neon decode)
but at 42 patches this is already a pretty hefty patchset, so
I have deferred those to attack later once this has got in.
On the downside, there are more lines of code here, but some of
them we'll get back when we finish some of the cleanups noted
above, some are just copyright-and-license boilerplate, and I
think the rest are well invested in easier to modify code...
Patch 1 is Richard's recent decodetree script bugfix, which
is needed for the VFP decode to behave correctly.
Tested with RISU, a mixture of comparison against real Cortex-A7
and Cortex-A8 and against the old version of QEMU, plus some
smoke-testing of aarch32 system emulation.
thanks
-- PMM
Peter Maydell (41):
target/arm: Add stubs for AArch32 VFP decodetree
target/arm: Factor out VFP access checking code
target/arm: Fix Cortex-R5F MVFR values
target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
target/arm: Convert the VSEL instructions to decodetree
target/arm: Convert VMINNM, VMAXNM to decodetree
target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
target/arm: Add helpers for VFP register loads and stores
target/arm: Convert "double-precision" register moves to decodetree
target/arm: Convert "single-precision" register moves to decodetree
target/arm: Convert VFP two-register transfer insns to decodetree
target/arm: Convert VFP VLDR and VSTR to decodetree
target/arm: Convert the VFP load/store multiple insns to decodetree
target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
target/arm: Convert VFP VMLA to decodetree
target/arm: Convert VFP VMLS to decodetree
target/arm: Convert VFP VNMLS to decodetree
target/arm: Convert VFP VNMLA to decodetree
target/arm: Convert VMUL to decodetree
target/arm: Convert VNMUL to decodetree
target/arm: Convert VADD to decodetree
target/arm: Convert VSUB to decodetree
target/arm: Convert VDIV to decodetree
target/arm: Convert VFP fused multiply-add insns to decodetree
target/arm: Convert VMOV (imm) to decodetree
target/arm: Convert VABS to decodetree
target/arm: Convert VNEG to decodetree
target/arm: Convert VSQRT to decodetree
target/arm: Convert VMOV (register) to decodetree
target/arm: Convert VFP comparison insns to decodetree
target/arm: Convert the VCVT-from-f16 insns to decodetree
target/arm: Convert the VCVT-to-f16 insns to decodetree
target/arm: Convert VFP round insns to decodetree
target/arm: Convert double-single precision conversion insns to
decodetree
target/arm: Convert integer-to-float insns to decodetree
target/arm: Convert VJCVT to decodetree
target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
target/arm: Convert float-to-integer VCVT insns to decodetree
target/arm: Fix short-vector increment behaviour
Richard Henderson (1):
decodetree: Fix comparison of Field
target/arm/Makefile.objs | 13 +
target/arm/cpu.h | 11 +
target/arm/cpu.c | 6 +
target/arm/translate-vfp.inc.c | 2672 ++++++++++++++++++++++++++++++++
target/arm/translate.c | 1503 +-----------------
scripts/decodetree.py | 2 +-
target/arm/vfp-uncond.decode | 63 +
target/arm/vfp.decode | 242 +++
8 files changed, 3036 insertions(+), 1476 deletions(-)
create mode 100644 target/arm/translate-vfp.inc.c
create mode 100644 target/arm/vfp-uncond.decode
create mode 100644 target/arm/vfp.decode
--
2.20.1
- [Qemu-devel] [PATCH v2 00/42] target/arm: Convert VFP decoder to decodetree,
Peter Maydell <=
- [Qemu-devel] [PATCH v2 01/42] decodetree: Fix comparison of Field, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 02/42] target/arm: Add stubs for AArch32 VFP decodetree, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 04/42] target/arm: Fix Cortex-R5F MVFR values, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 05/42] target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 03/42] target/arm: Factor out VFP access checking code, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 08/42] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 09/42] target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 07/42] target/arm: Convert VMINNM, VMAXNM to decodetree, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 15/42] target/arm: Convert VFP VLDR and VSTR to decodetree, Peter Maydell, 2019/06/11
- [Qemu-devel] [PATCH v2 14/42] target/arm: Convert VFP two-register transfer insns to decodetree, Peter Maydell, 2019/06/11