qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1


From: LIU Zhiwei
Subject: Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
Date: Wed, 25 Dec 2019 17:36:06 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1


On 2019/12/20 4:38, Richard Henderson wrote:
On 12/18/19 11:11 PM, LIU Zhiwei wrote:
I'm sorry that it's really hard to absorb your opinion. I don't know why clang
will fail

when index beyond the end of vreg[n] into vreg[n+1].
I thought sure one of the address sanitizer checks would detect array bounds
overrun.  But it becomes irrelevant

As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be a
property which can be

specified from command line.  So the sub-struct maybe defined as

struct {
    union{
        uint64_t *u64 ;
        int64_t  *s64;
        uint32_t *u32;
        int32_t  *s32;
        uint16_t *u16;
        int16_t  *s16;
        uint8_t  *u8;
        int8_t   *s8;
    } mem;
    target_ulong vxrm;
    target_ulong vxsat;
    target_ulong vl;
    target_ulong vstart;
    target_ulong vtype;
} vext;

Will that be OK?
Pointers have consequences.  It can be done, but I don't think it is ideal.

The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the bit of
(VSTART == 0 && VL == VLMAX).

So it will take 8 bits of tb flags for vector extension at least.
Good.

      
However, I have one problem to support both command line VLEN and vreg_ofs.

As in SVE,  vreg ofs is the offset from cpu_env. If the structure of vector
extension (to support command line VLEN) is

struct {
    union{
        uint64_t *u64 ;
        int64_t  *s64;
        uint32_t *u32;
        int32_t  *s32;
        uint16_t *u16;
        int16_t  *s16;
        uint8_t  *u8;
        int8_t   *s8;
    } mem;
    target_ulong vxrm;
    target_ulong vxsat;
    target_ulong vl;
    target_ulong vstart;
    target_ulong vtype;
} vext

I can't find the way to get the direct offset of vreg from cpu_env.

Maybe I should specify a max VLEN like the way of SVE?
I think a maximum vlen is best.  A command-line option to adjust vlen is all
well and good, but there's no reason to have to support vlen=(1<<29).

Oh, and you probably need a minimum vlen of 16 bytes as well, otherwise you
will run afoul of the assert in tcg-op-gvec.c that requires gvec operations to
be aligned mod 16.

I think that all you need is

    uint64_t vreg[32 * MAX_VLEN / 8] QEMU_ALIGNED(16);

which gives us

uint32_t vreg_ofs(DisasContext *ctx, int reg)
{
    return offsetof(CPURISCVState, vreg) + reg * ctx->vlen;
}

struct {

        uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
        target_ulong vxrm;
        target_ulong vxsat;
        target_ulong vl;
        target_ulong vstart;
        target_ulong vtype;
    } vext;

Is it OK?

I don't see the point of a union for vreg.  I don't think you'll find that you
actually use it at all.

I think I can move most of execution check to translate time like SVE now. However, there are still some differences from SVE.

1)cpu_env must be used as a parameter for helper function.

    The helpers need  use env->vext.vl and env->vext.vstart.  Thus it will be difficult to use out of line tcg_gen_gvec_ool.

    void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,

                        uint32_t oprsz, uint32_t maxsz, int32_t data,
                        gen_helper_gvec_2 *fn)
    {    
        ......
        fn(a0, a1, desc);
         ......
     }
    Maybe I have to write  something similar to tcg_gen_gvec_ool in trans_rvv.inc.c.  But it will be redundant.

2)simd_desc is not proper.

    I also need to transfer some members of DisasContext to helpers. 

    (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of elements of this operation, so it will defined as ctx->lmul * ctx->vlen / ctx->sew;

Data is reserved to expand.  Mlen is mask length for one elment, so it will defined as ctx->sew/ctx->lmul. As with Mlen, a active element will

be selected by

static inline int vext_elem_mask(void *v0, int mlen, int index)
{
    int idx = (index * mlen) / 8;
    int pos = (index * mlen) % 8;

    return (v0[idx] >> pos) & 0x1;
}

    So I may have to implement vext_desc instead of use the simd_desc, which will be another redundant. Maybe a better way to mask elements?

You do need to document the element ordering that you're going to use for vreg.
 I.e. the mapping between the architectural vector register state and the
emulation state.  You have two choices:

(1) all bytes in host endianness (e.g. target/ppc)
(2) bytes within each uint64_t in host endianness,
    but each uint64_t is little-endian (e.g. target/arm).

Both require some fixup when running on a big-endian host.

Yes, I will take (2).


Best Regards,

Zhiwei


r~

reply via email to

[Prev in Thread] Current Thread [Next in Thread]