Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 loa

From:	David Hildenbrand
Subject:	Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
Date:	Tue, 28 May 2019 20:33:12 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 28.05.19 19:28, David Hildenbrand wrote:
> On 23.05.19 00:28, Richard Henderson wrote:
>> This instruction raises #GP, aka SIGSEGV, if the effective address
>> is not aligned to 16-bytes.
>>
>> We have assertions in tcg-op-gvec.c that the offset from ENV is
>> aligned, for vector types <= V128.  But the offset itself does not
>> validate that the final pointer is aligned -- one must also remember
>> to use the QEMU_ALIGNED() attribute on the vector member within ENV.
>>
>> PowerPC Altivec has vector load/store instructions that silently
>> discard the low 4 bits of the address, making alignment mistakes
>> difficult to discover.  Aid that by making the most popular host
>> visibly signal the error.
>>
>> Signed-off-by: Richard Henderson <address@hidden>
>> ---
>>  tcg/i386/tcg-target.inc.c | 24 ++++++++++++++++++++++--
>>  1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
>> index 6ec5e60448..c0443da4af 100644
>> --- a/tcg/i386/tcg-target.inc.c
>> +++ b/tcg/i386/tcg-target.inc.c
>> @@ -1082,14 +1082,24 @@ static void tcg_out_ld(TCGContext *s, TCGType type, 
>> TCGReg ret,
>>          }
>>          /* FALLTHRU */
>>      case TCG_TYPE_V64:
>> +        /* There is no instruction that can validate 8-byte alignment.  */
>>          tcg_debug_assert(ret >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V128:
>> +        /*
>> +         * The gvec infrastructure is asserts that v128 vector loads
>> +         * and stores use a 16-byte aligned offset.  Validate that the
>> +         * final pointer is aligned by using an insn that will SIGSEGV.
>> +         */
>>          tcg_debug_assert(ret >= 16);
>> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2);
>> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_VxWx, ret, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V256:
>> +        /*
>> +         * The gvec infrastructure only requires 16-byte alignment,
>> +         * so here we must use an unaligned load.
>> +         */
>>          tcg_debug_assert(ret >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL,
>>                                   ret, 0, arg1, arg2);
>> @@ -1117,14 +1127,24 @@ static void tcg_out_st(TCGContext *s, TCGType type, 
>> TCGReg arg,
>>          }
>>          /* FALLTHRU */
>>      case TCG_TYPE_V64:
>> +        /* There is no instruction that can validate 8-byte alignment.  */
>>          tcg_debug_assert(arg >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V128:
>> +        /*
>> +         * The gvec infrastructure is asserts that v128 vector loads
>> +         * and stores use a 16-byte aligned offset.  Validate that the
>> +         * final pointer is aligned by using an insn that will SIGSEGV.
>> +         */
>>          tcg_debug_assert(arg >= 16);
>> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2);
>> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V256:
>> +        /*
>> +         * The gvec infrastructure only requires 16-byte alignment,
>> +         * so here we must use an unaligned store.
>> +         */
>>          tcg_debug_assert(arg >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL,
>>                                   arg, 0, arg1, arg2);
>>
> 
> This is the problematic patch. Haven't looked into the details yet, so I
> can't tell what's wrong. Maybe really an alignemnt issue?
> 

Okay, looks like "vregs" in "struct CPUS390XState" is always aligned to
8, but not to 16 bytes.

And that in return is the case, because "CPUS390XState env" is not
aligned to 16 bytes in "struct S390CPU"


!!!!!!!! CPU: 0x55a5e3046ef0
!!!!!!!! ENV: 0x55a5e304f1a8
!!!!!!!! VREGS: 0x55a5e304f228
!!!!!!!! CPU: 0x55a5e3070bb0
!!!!!!!! ENV: 0x55a5e3078e68
!!!!!!!! VREGS: 0x55a5e3078ee8
!!!!!!!! CPU: 0x55a5e3098310
!!!!!!!! ENV: 0x55a5e30a05c8
!!!!!!!! VREGS: 0x55a5e30a0648
!!!!!!!! CPU: 0x55a5e30c0730
!!!!!!!! ENV: 0x55a5e30c89e8
!!!!!!!! VREGS: 0x55a5e30c8a68
!!!!!!!! CPU: 0x55a5e30e7c90
!!!!!!!! ENV: 0x55a5e30eff48
!!!!!!!! VREGS: 0x55a5e30effc8
!!!!!!!! CPU: 0x55a5e310eea0
!!!!!!!! ENV: 0x55a5e3117158
!!!!!!!! VREGS: 0x55a5e31171d8
!!!!!!!! CPU: 0x55a5e31361e0
!!!!!!!! ENV: 0x55a5e313e498
!!!!!!!! VREGS: 0x55a5e313e518
!!!!!!!! CPU: 0x55a5e315d520
!!!!!!!! ENV: 0x55a5e31657d8
!!!!!!!! VREGS: 0x55a5e3165858

vregs is defined as:

CPU_DoubleU vregs[32][2];

We either have to switch to a type that has a natural alignment of 16
bytes, or enforce alignment of "CPUS390XState env" to 16 bytes.

What do you suggest?

-- 

Thanks,

David / dhildenb

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PULL 06/16] tcg: Expand vector minmax using cmp+cmpsel, (continued)
- [Qemu-devel] [PULL 06/16] tcg: Expand vector minmax using cmp+cmpsel, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 05/16] tcg: Introduce do_op3_nofail for vector expansion, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 07/16] tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 14/16] tcg/aarch64: Build vector immediates with two insns, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 09/16] tcg/i386: Remove expansion for missing minmax, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 10/16] tcg/i386: Use umin/umax in expanding unsigned compare, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 13/16] tcg/aarch64: Use MVNI in tcg_out_dupi_vec, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store, Richard Henderson, 2019/05/22
  - Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store, David Hildenbrand, 2019/05/28
    - Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store, David Hildenbrand <=
    - Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store, David Hildenbrand, 2019/05/28
    - Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store, Richard Henderson, 2019/05/28
- [Qemu-devel] [PULL 11/16] tcg/aarch64: Support vector bitwise select value, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 12/16] tcg/aarch64: Split up is_fimm, Richard Henderson, 2019/05/22
- [Qemu-devel] [PULL 15/16] tcg/aarch64: Allow immediates for vector ORR and BIC, Richard Henderson, 2019/05/22
- Re: [Qemu-devel] [PULL 00/16] tcg queued patches, Aleksandar Markovic, 2019/05/23
  - Re: [Qemu-devel] [PULL 00/16] tcg queued patches, Richard Henderson, 2019/05/23
- Re: [Qemu-devel] [PULL 00/16] tcg queued patches, Peter Maydell, 2019/05/24
- Re: [Qemu-devel] [PULL 00/16] tcg queued patches, David Hildenbrand, 2019/05/28

Prev by Date: Re: [Qemu-devel] [PATCH for 4.1 v2] target/riscv: Expose time CSRs when allowed by [m|s]counteren
Next by Date: Re: [Qemu-devel] Make target check-report.tap is broken
Previous by thread: Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
Next by thread: Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
Index(es):
- Date
- Thread