[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v4 22/33] tcg-aarch64: Use MOVN in tcg_out_movi
From: |
Richard Henderson |
Subject: |
[Qemu-devel] [PATCH v4 22/33] tcg-aarch64: Use MOVN in tcg_out_movi |
Date: |
Sat, 14 Sep 2013 14:54:39 -0700 |
When profitable, initialize the register with MOVN instead of MOVZ,
before setting the remaining lanes with MOVK.
Signed-off-by: Richard Henderson <address@hidden>
---
tcg/aarch64/tcg-target.c | 88 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 75 insertions(+), 13 deletions(-)
diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index f9319ed..cecda05 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -559,24 +559,86 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
TCGReg rd,
tcg_target_long value)
{
AArch64Insn insn;
-
- if (type == TCG_TYPE_I32) {
+ int i, wantinv, shift;
+ tcg_target_long svalue = value;
+ tcg_target_long ivalue, imask;
+
+ /* For 32-bit values, discard potential garbage in value. For 64-bit
+ values within [2**31, 2**32-1], we can create smaller sequences by
+ interpreting this as a negative 32-bit number, while ensuring that
+ the high 32 bits are cleared by setting SF=0. */
+ if (type == TCG_TYPE_I32 || (value & ~0xffffffffull) == 0) {
+ svalue = (int32_t)value;
value = (uint32_t)value;
+ type = TCG_TYPE_I32;
+ }
+
+ /* Would it take fewer insns to begin with MOVN? For the value and its
+ inverse, count the number of 16-bit lanes that are 0. For the benefit
+ of 32-bit quantities, compare the zero-extended normal value vs the
+ sign-extended inverted value. For example,
+ v = 0x00000000f100ffff, zeros = 2
+ ~v = 0xffffffff0eff0000, zeros = 1
+ ~sv = 0x000000000eff0000, zeros = 3
+ By using ~sv we see that 3 > 2, leading us to emit just a single insn
+ "movn ret, 0x0eff, lsl #16". */
+
+ ivalue = ~svalue;
+ imask = 0;
+ wantinv = 0;
+
+ /* ??? This can be done in the simd unit without a loop:
+ // Move value and ivalue into V0 and V1 respectively.
+ mov v0.d[0], value
+ mov v1.d[0], ivalue
+ // Compare each 16-bit lane vs 0, producing -1 for true.
+ cmeq v0.4h, v0.4h, #0
+ cmeq v1.4h, v1.4h, #0
+ mov imask, v1.d[0]
+ // Sum the comparisons, producing 0 to -4.
+ addv h0, v0.4h
+ addv h1, v1.4h
+ // Subtract the two, forming a positive wantinv result.
+ sub v0.4h, v0.4h, v1.4h
+ smov wantinv, v0.h[0]
+ */
+ for (i = 0; i < 64; i += 16) {
+ tcg_target_long mask = 0xffffull << i;
+ if ((value & mask) == 0) {
+ wantinv -= 1;
+ }
+ if ((ivalue & mask) == 0) {
+ wantinv += 1;
+ imask |= mask;
+ }
}
- /* count trailing zeros in 16 bit steps, mapping 64 to 0. Emit the
- first MOVZ with the half-word immediate skipping the zeros, with a shift
- (LSL) equal to this number. Then all next instructions use MOVKs.
- Zero the processed half-word in the value, continue until empty.
- We build the final result 16bits at a time with up to 4 instructions,
- but do not emit instructions for 16bit zero holes. */
+ /* If we had more 0xffff than 0x0000, invert VALUE and use MOVN. */
insn = INSN_MOVZ;
- do {
- unsigned shift = ctz64(value) & (63 & -16);
- tcg_fmt_Rd_uimm(s, insn, shift >= 32, rd, value >> shift, shift);
+ if (wantinv > 0) {
+ value = ivalue;
+ insn = INSN_MOVN;
+ }
+
+ /* Find the lowest lane that is not 0x0000. */
+ shift = ctz64(value) & (63 & -16);
+ tcg_fmt_Rd_uimm(s, insn, type, rd, value >> shift, shift);
+
+ if (wantinv > 0) {
+ /* Re-invert the value, so MOVK sees non-inverted bits. */
+ value = ~value;
+ /* Clear out all the 0xffff lanes. */
+ value ^= imask;
+ }
+ /* Clear out the lane that we just set. */
+ value &= ~(0xffffUL << shift);
+
+ /* Iterate until all lanes have been set, and thus cleared from VALUE. */
+ while (value) {
+ shift = ctz64(value) & (63 & -16);
+ tcg_fmt_Rd_uimm(s, INSN_MOVK, type, rd, value >> shift, shift);
value &= ~(0xffffUL << shift);
- insn = INSN_MOVK;
- } while (value);
+ }
}
static inline void tcg_out_ldst_r(TCGContext *s,
--
1.8.3.1
- [Qemu-devel] [PATCH v4 12/33] tcg-aarch64: Handle constant operands to and, or, xor, (continued)
- [Qemu-devel] [PATCH v4 12/33] tcg-aarch64: Handle constant operands to and, or, xor, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 13/33] tcg-aarch64: Support andc, orc, eqv, not, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 14/33] tcg-aarch64: Handle zero as first argument to sub, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 15/33] tcg-aarch64: Support movcond, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 16/33] tcg-aarch64: Use tcg_fmt_Rdnm_cond for setcond, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 17/33] tcg-aarch64: Support deposit, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 18/33] tcg-aarch64: Support add2, sub2, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 19/33] tcg-aarch64: Support muluh, mulsh, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 20/33] tcg-aarch64: Support div, rem, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 21/33] tcg-aarch64: Introduce tcg_fmt_Rd_uimm, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 22/33] tcg-aarch64: Use MOVN in tcg_out_movi,
Richard Henderson <=
[Qemu-devel] [PATCH v4 23/33] tcg-aarch64: Use ORRI in tcg_out_movi, Richard Henderson, 2013/09/14
[Qemu-devel] [PATCH v4 24/33] tcg-aarch64: Special case small constants in tcg_out_movi, Richard Henderson, 2013/09/14
[Qemu-devel] [PATCH v4 25/33] tcg-aarch64: Use adrp in tcg_out_movi, Richard Henderson, 2013/09/14
[Qemu-devel] [PATCH v4 26/33] tcg-aarch64: Avoid add with zero in tlb load, Richard Henderson, 2013/09/14
[Qemu-devel] [PATCH v4 27/33] tcg-aarch64: Pass return address to load/store helpers directly., Richard Henderson, 2013/09/14
[Qemu-devel] [PATCH v4 28/33] tcg-aarch64: Use tcg_out_call for qemu_ld/st, Richard Henderson, 2013/09/14