|
From: | LIU Zhiwei |
Subject: | Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1 |
Date: | Wed, 25 Dec 2019 17:36:06 +0800 |
User-agent: | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 |
On 12/18/19 11:11 PM, LIU Zhiwei wrote:I'm sorry that it's really hard to absorb your opinion. I don't know why clang will fail when index beyond the end of vreg[n] into vreg[n+1].I thought sure one of the address sanitizer checks would detect array bounds overrun. But it becomes irrelevantAs Chih-Min Chao said in another part of PATCH V2 thread, VLEN will be a property which can be specified from command line. So the sub-struct maybe defined as struct { union{ uint64_t *u64 ; int64_t *s64; uint32_t *u32; int32_t *s32; uint16_t *u16; int16_t *s16; uint8_t *u8; int8_t *s8; } mem; target_ulong vxrm; target_ulong vxsat; target_ulong vl; target_ulong vstart; target_ulong vtype; } vext; Will that be OK?Pointers have consequences. It can be done, but I don't think it is ideal.The (ill, lmul, sew ) of vtype will be placed within tb_flags, also the bit of (VSTART == 0 && VL == VLMAX). So it will take 8 bits of tb flags for vector extension at least.Good.
However, I have one problem to support both command line VLEN and vreg_ofs. As in SVE, vreg ofs is the offset from cpu_env. If the structure of vector extension (to support command line VLEN) is struct { union{ uint64_t *u64 ; int64_t *s64; uint32_t *u32; int32_t *s32; uint16_t *u16; int16_t *s16; uint8_t *u8; int8_t *s8; } mem; target_ulong vxrm; target_ulong vxsat; target_ulong vl; target_ulong vstart; target_ulong vtype; } vext I can't find the way to get the direct offset of vreg from cpu_env. Maybe I should specify a max VLEN like the way of SVE?I think a maximum vlen is best. A command-line option to adjust vlen is all well and good, but there's no reason to have to support vlen=(1<<29). Oh, and you probably need a minimum vlen of 16 bytes as well, otherwise you will run afoul of the assert in tcg-op-gvec.c that requires gvec operations to be aligned mod 16. I think that all you need is uint64_t vreg[32 * MAX_VLEN / 8] QEMU_ALIGNED(16); which gives us uint32_t vreg_ofs(DisasContext *ctx, int reg) { return offsetof(CPURISCVState, vreg) + reg * ctx->vlen; }
struct {
uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);Is it OK?
I don't see the point of a union for vreg. I don't think you'll find that you actually use it at all.
I think I can move most of execution check to translate time like SVE now. However, there are still some differences from SVE.
1)cpu_env must be used as a parameter for helper function.
The helpers need use env->vext.vl and env->vext.vstart. Thus it will be difficult to use out of line tcg_gen_gvec_ool.
void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,
uint32_t oprsz, uint32_t maxsz, int32_t data,2)simd_desc is not proper.
I also need to transfer some members of DisasContext to
helpers.
(Data, Vlmax, Mlen) is my current choice. Vlmax is the num of elements of this operation, so it will defined as ctx->lmul * ctx->vlen / ctx->sew;
Data is reserved to expand. Mlen is mask length for one elment, so it will defined as ctx->sew/ctx->lmul. As with Mlen, a active element will
be selected bystatic inline int vext_elem_mask(void *v0, int mlen, int index)
{
int idx = (index * mlen) / 8;
int pos = (index * mlen) % 8;
return (v0[idx] >> pos) & 0x1;
}
So I may have to implement vext_desc instead of use the
simd_desc, which will be another redundant. Maybe a better way to
mask elements?
You do need to document the element ordering that you're going to use for vreg. I.e. the mapping between the architectural vector register state and the emulation state. You have two choices: (1) all bytes in host endianness (e.g. target/ppc) (2) bytes within each uint64_t in host endianness, but each uint64_t is little-endian (e.g. target/arm). Both require some fixup when running on a big-endian host.
Yes, I will take (2).
Best Regards,
Zhiwei
r~
[Prev in Thread] | Current Thread | [Next in Thread] |