[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6 00/25] target/riscv: MSTATUS_SUM + cleanups
From: |
Wu, Fei |
Subject: |
Re: [PATCH v6 00/25] target/riscv: MSTATUS_SUM + cleanups |
Date: |
Tue, 4 Apr 2023 15:23:59 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 |
On 4/4/2023 3:11 PM, LIU Zhiwei wrote:
>
> On 2023/4/4 14:42, Wu, Fei wrote:
>> On 3/25/2023 6:54 PM, Richard Henderson wrote:
>>> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
>>>
>>> * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
>>>
>>> * Using cpu_mmu_index(env, true) is insufficient to implement
>>> HLVX properly. While that chooses the correct mmu_idx, it
>>> does not perform the read with execute permission.
>>> I add a new tcg interface to perform a read-for-execute with
>>> an arbitrary mmu_idx. This is still not 100% compliant, but
>>> it's closer.
>>>
>>> * Handle mstatus.MPV in cpu_mmu_index.
>>> * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>>> * Cleanups for get_physical_address.
>>>
>>> While this passes check-avocado, I'm sure that's insufficient.
>>> Please have a close look.
>>>
>> I tested stress-ng to get the feeling of performance gain, although
>> stress-ng is not designed to be a performance workload. btw, I had to
>> revert commit 0ee342256af9 which is unrelated to this series, or qemu
>> exited during the test.
>> ./stress-ng --timeout 5 --metrics-brief --class os --sequential 1
>>
>> Here is the result, in general most of the tests benefit from these
>> series, but please note that not all the results are necessary to be
>> consistent across multiple runs, and some regressions are not real but I
>> haven't checked it one by one.
>>
>> master(60ca584b) master + this speedup
>>
>> stressor bogo ops/s bogo ops/s
>> (usr+sys time) (usr+sys time)
>> sigsuspend 19430.09 1492746.34 76.8265
>> utime 8779.64 271023.89 30.8696
>> opcode 11315.78 10538.58 0.931317
>> nice 154327.30 136797.63 0.886412
>> mremap 225.29 198.82 0.882507
>> exec 4118.89 3282.85 0.797023
>> vm-addr 214.25 166.69 0.778016
>> landlock 950.00 722.74 0.760779
>
> Thanks for testing. Have you analyzed the cases with worse performance?
> As we are doing a optimization.
>
During the 1st run, 'io' showed the worst regression, and it's proved
not a real regression or at least not consistent when I tried it again.
master(60ca584b) this run1 speedup1 this run2 speedup2
stressor bogo ops/s bogo ops/s
(usr+sys time) (usr+sys time)
fallocate 32711.39 33794.28 1.0331 32067.69 0.980322
sigchld 46289.82 42975.50 0.928401 44914.65 0.970292
inotify 3013.11 3511.21 1.16531 2879.87 0.95578
opcode 11315.78 10084.42 0.891182 10538.58 0.931317
nice 154327.30 186649.43 1.20944 136797.63 0.886412
mremap 225.29 237.39 1.05371 198.82 0.882507
exec 4118.89 4248.12 1.03137 3282.85 0.797023
vm-addr 214.25 268.60 1.25368 166.69 0.778016
landlock 950.00 791.12 0.832758 722.74 0.760779
io 371206.67 205232.61 0.55288 409205.80 1.10237
Thanks,
Fei.
> Thanks,
> Zhiwei
>
>> Thanks,
>> Fei.
>>> r~
>>>
>>>
>>> Fei Wu (2):
>>> target/riscv: Separate priv from mmu_idx
>>> target/riscv: Reduce overhead of MSTATUS_SUM change
>>>
>>> LIU Zhiwei (4):
>>> target/riscv: Extract virt enabled state from tb flags
>>> target/riscv: Add a general status enum for extensions
>>> target/riscv: Encode the FS and VS on a normal way for tb flags
>>> target/riscv: Add a tb flags field for vstart
>>>
>>> Richard Henderson (19):
>>> target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>>> accel/tcg: Add cpu_ld*_code_mmu
>>> target/riscv: Use cpu_ld*_code_mmu for HLVX
>>> target/riscv: Handle HLV, HSV via helpers
>>> target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>>> target/riscv: Introduce mmuidx_sum
>>> target/riscv: Introduce mmuidx_priv
>>> target/riscv: Introduce mmuidx_2stage
>>> target/riscv: Move hstatus.spvp check to check_access_hlsv
>>> target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>>> target/riscv: Check SUM in the correct register
>>> target/riscv: Hoist second stage mode change to callers
>>> target/riscv: Hoist pbmte and hade out of the level loop
>>> target/riscv: Move leaf pte processing out of level loop
>>> target/riscv: Suppress pte update with is_debug
>>> target/riscv: Don't modify SUM with is_debug
>>> target/riscv: Merge checks for reserved pte flags
>>> target/riscv: Reorg access check in get_physical_address
>>> target/riscv: Reorg sum check in get_physical_address
>>>
>>> include/exec/cpu_ldst.h | 9 +
>>> target/riscv/cpu.h | 47 ++-
>>> target/riscv/cpu_bits.h | 12 +-
>>> target/riscv/helper.h | 12 +-
>>> target/riscv/internals.h | 35 ++
>>> accel/tcg/cputlb.c | 48 +++
>>> accel/tcg/user-exec.c | 58 +++
>>> target/riscv/cpu.c | 2 +-
>>> target/riscv/cpu_helper.c | 393 +++++++++---------
>>> target/riscv/csr.c | 21 +-
>>> target/riscv/op_helper.c | 113 ++++-
>>> target/riscv/translate.c | 72 ++--
>>> .../riscv/insn_trans/trans_privileged.c.inc | 2 +-
>>> target/riscv/insn_trans/trans_rvf.c.inc | 2 +-
>>> target/riscv/insn_trans/trans_rvh.c.inc | 135 +++---
>>> target/riscv/insn_trans/trans_rvv.c.inc | 22 +-
>>> target/riscv/insn_trans/trans_xthead.c.inc | 7 +-
>>> 17 files changed, 595 insertions(+), 395 deletions(-)
>>>