[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 05/17] RISC-V: add vector extension load and

From: LIU Zhiwei
Subject: Re: [Qemu-devel] [PATCH v2 05/17] RISC-V: add vector extension load and store instructions
Date: Wed, 8 Jan 2020 09:32:33 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1

Hi Richard,

Sorry to reply so late for this comment.  I will move forward on part 2.
On 2019/9/12 22:23, Richard Henderson wrote:
+static bool  vector_lmul_check_reg(CPURISCVState *env, uint32_t lmul,
+        uint32_t reg, bool widen)
+    int legal = widen ? (lmul * 2) : lmul;
+    if ((lmul != 1 && lmul != 2 && lmul != 4 && lmul != 8) ||
+        (lmul == 8 && widen)) {
+        helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
+        return false;
+    }
+    if (reg % legal != 0) {
+        helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
+        return false;
+    }
+    return true;
These exceptions will not do the right thing.

You cannot call helper_raise_exception from another helper, or from something
called from another helper, as here.  You need to use riscv_raise_exception, as
you do elsewhere in this patch, with a GETPC() value passed down from the
outermost helper.

Ideally you would check these conditions at translate time.
I've mentioned how to do this in reply to your v1.
As discussed in part1,  I will check these conditions at translate time.
+        } else if (i < vl) {
+            switch (width) {
+            case 8:
+                if (vector_elem_mask(env, vm, width, lmul, i)) {
+                    while (k >= 0) {
+                        read = i * (nf + 1)  + k;
+                        env->vfp.vreg[dest + k * lmul].u8[j] =
+                            cpu_ldub_data(env, env->gpr[rs1] + read);
You must not modify vreg[x] before you've recognized all possible exceptions,
e.g. validating that a subsequent access will not trigger a page fault.
Otherwise you will have a partially modified register value when the exception
handler is entered.
There are two questions here.

1) How to validate access before real access to registers?

As pointed in another comment for patchset v1, 
"instructions that perform more than one host store must probe
      the entire range to be stored before performing any stores.
I didn't see the validation of page in SVE,  for example, sve_st1_r,
which directly use the  helper_ret_*_mmu  that may cause an page fault exception or ovelap a watchpoint,
before probe the entire range to be stored .

2) Why not use the  cpu_ld*  API?

I see in SVE that ld*_p is used to directly access the host memory. And helper_ret_*_mmu
is used to access guest memory. But from the definition of cpu_ld*, it's the combination of
ld*_p and helper_ret_*_mmu.
    entry = tlb_entry(env, mmu_idx, addr);
    if (unlikely(entry->ADDR_READ !=
                 (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) {
        oi = make_memop_idx(SHIFT, mmu_idx);
        res = glue(glue(helper_ret_ld, URETSUFFIX), MMUSUFFIX)(env, addr,
                                                            oi, retaddr);
    } else {
        uintptr_t hostaddr = addr + entry->addend;
        res = glue(glue(ld, USUFFIX), _p)((uint8_t *)hostaddr);

So I don't know  why not use cpu_ld* API?
Without a stride, and without a predicate mask, this can be done with at most
two calls to probe_access (one per page).  This is the simplification that
makes splitting the helper into two very helpful.

With a stride or with a predicate mask requires either
(1) temporary storage for the loads, and copy back to env at the end, or
(2) use probe_access for each load, and then perform the actual loads directly
into env.

FWIW, ARM SVE uses (1), as probe_access is very new.

+                        k--;
+                    }
+                    env->vfp.vstart++;
+                }
+                break;
+            case 16:
+                if (vector_elem_mask(env, vm, width, lmul, i)) {
+                    while (k >= 0) {
+                        read = i * (nf + 1)  + k;
+                        env->vfp.vreg[dest + k * lmul].u16[j] =
+                            cpu_ldub_data(env, env->gpr[rs1] + read);
I don't see anything in these assignments to vreg[x].uN[y] that take the
endianness of the host into account.

You need to think about how the architecture defines the overlap of elements --
particularly across vlset -- and make adjustments.

I can imagine, if you have explicit tests for this, your tests are passing
because the architecture defines a little-endian based indexing of the register
file, and you have only run tests on a little-endian host, like x86_64.

For ARM, we define the representation as a little-endian indexed array of
host-endian uint64_t.  This means that a big-endian host needs to adjust the
address of any element smaller than 64-bit.  E.g.

#define H1(x)   ((x) ^ 7)
#define H2(x)   ((x) ^ 3)
#define H4(x)   ((x) ^ 1)
#define H1(x)   (x)
#define H2(x)   (x)
#define H4(x)   (x)

    env->vfp.vreg[reg + k * lmul].u16[H2(j)]

I will take it.  However I didn't have  a big-endian host to test the feature.

+        if (base >= abs_off) {
+            return base - abs_off;
+        }
+    } else {
+        if ((target_ulong)((target_ulong)offset + base) >= base) {
+            return (target_ulong)offset + base;
+        }
+    }
Why all the extra casting here?  They are exactly what is implied by C.

+    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
+    return 0;
(1) This exception call won't work, as above,
(2) Where does this condition against wraparound come from?
    I don't see it in the specification.
(3) You certainly cannot detect this after having written a
    previous element to the register file.

[ Skipping lots of functions that are basically the same. ]

+void VECTOR_HELPER(vsxe_v)(CPURISCVState *env, uint32_t nf, uint32_t vm,
+    uint32_t rs1, uint32_t rs2, uint32_t rd)
Pass rs1 by value.

+            case 8:
+                if (vector_elem_mask(env, vm, width, lmul, i)) {
+                    while (k >= 0) {
+                        addr = vector_get_index(env, rs1, src2, j, 1, width, k);
+                        cpu_stb_data(env, addr,
+                            env->vfp.vreg[dest + k * lmul].s8[j]);
Must probe_access all of the memory before any stores.
Unlike loads, you don't have the option of storing into a temporary.
Which suggests a common subroutine to perform the probe(s), rather
than bother with a temporary for loads.

Thanks again for your informative comments.

Best Regards,


reply via email to

[Prev in Thread] Current Thread [Next in Thread]