qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 00/12] Improve PMU support


From: Atish Patra
Subject: Re: [PATCH v7 00/12] Improve PMU support
Date: Thu, 7 Apr 2022 15:39:58 -0700

On Wed, Mar 30, 2022 at 5:01 PM Atish Patra <atishp@rivosinc.com> wrote:
>
> The latest version of the SBI specification includes a Performance Monitoring
> Unit(PMU) extension[1] which allows the supervisor to start/stop/configure
> various PMU events. The Sscofpmf ('Ss' for Privileged arch and 
> Supervisor-level
> extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filtering)
> extension[2] allows the perf like tool to handle overflow interrupts and
> filtering support.
>
> This series implements full PMU infrastructure to support
> PMU in virt machine. This will allow us to add any PMU events in future.
>
> Currently, this series enables the following omu events.
> 1. cycle count
> 2. instruction count
> 3. DTLB load/store miss
> 4. ITLB prefetch miss
>
> The first two are computed using host ticks while last three are counted 
> during
> cpu_tlb_fill. We can do both sampling and count from guest userspace.
> This series has been tested on both RV64 and RV32. Both Linux[3] and 
> Opensbi[4]
> patches are required to get the perf working.
>
> Here is an output of perf stat/report while running hackbench with OpenSBI & 
> Linux
> kernel patches applied [3]. The kernel patches are available in upstream as 
> well.
>
> Perf stat:
> ==========
> [root@fedora-riscv ~]# perf stat -e cycles -e instructions -e 
> dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> > perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups == 40 processes run
>
>      Total time: 0.265 [sec]
>
>  Performance counter stats for 'perf bench sched messaging -g 1 -l 10':
>
>      4,167,825,362      cycles
>      4,166,609,256      instructions              #    1.00  insn per cycle
>          3,092,026      dTLB-load-misses
>            258,280      dTLB-store-misses
>          2,068,966      iTLB-load-misses
>
>        0.585791767 seconds time elapsed
>
>        0.373802000 seconds user
>        1.042359000 seconds sys
>
> Perf record:
> ============
> [root@fedora-riscv ~]# perf record -e cycles -e instructions \
> > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000 \
> > perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups == 40 processes run
>
>      Total time: 1.397 [sec]
> [ perf record: Woken up 10 times to write data ]
> Check IO/CPU overload!
> [ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ]
>
> [root@fedora-riscv riscv]# perf report
> Available samples
> 107K cycles                                                                   
>
> 107K instructions                                                             
>
> 250 dTLB-load-misses                                                          
>
> 13 dTLB-store-misses                                                          
>
> 172 iTLB-load-misses
> ..
>
> Changes from v6->v7:
> 1. Fixed all the compilation errors for the usermode.
>
> Changes from v5->v6:
> 1. Fixed compilation issue with PATCH 1.
> 2. Addressed other comments.
>
> Changes from v4->v5:
> 1. Rebased on top of the -next with following patches.
>    - isa extension
>    - priv 1.12 spec
> 2. Addressed all the comments on v4
> 3. Removed additional isa-ext DT node in favor of riscv,isa string update
>
> Changes from v3->v4:
> 1. Removed the dummy events from pmu DT node.
> 2. Fixed pmu_avail_counters mask generation.
> 3. Added a patch to simplify the predicate function for counters.
>
> Changes from v2->v3:
> 1. Addressed all the comments on PATCH1-4.
> 2. Split patch1 into two separate patches.
> 3. Added explicit comments to explain the event types in DT node.
> 4. Rebased on latest Qemu.
>
> Changes from v1->v2:
> 1. Dropped the ACks from v1 as signficant changes happened after v1.
> 2. sscofpmf support.
> 3. A generic counter management framework.
>
> [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> [2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> [3] https://github.com/atishp04/linux/tree/riscv_pmu_v6
> [4] https://github.com/atishp04/qemu/tree/riscv_pmu_v7
>
> Atish Patra (12):
> target/riscv: Fix PMU CSR predicate function
> target/riscv: Implement PMU CSR predicate function for S-mode
> target/riscv: pmu: Rename the counters extension to pmu
> target/riscv: pmu: Make number of counters configurable
> target/riscv: Implement mcountinhibit CSR
> target/riscv: Add support for hpmcounters/hpmevents
> target/riscv: Support mcycle/minstret write operation
> target/riscv: Add sscofpmf extension support
> target/riscv: Simplify counter predicate function
> target/riscv: Add few cache related PMU events
> hw/riscv: virt: Add PMU DT node to the device tree
> target/riscv: Update the privilege field for sscofpmf CSRs
>
> hw/riscv/virt.c           |  28 ++
> target/riscv/cpu.c        |  14 +-
> target/riscv/cpu.h        |  49 ++-
> target/riscv/cpu_bits.h   |  59 +++
> target/riscv/cpu_helper.c |  25 ++
> target/riscv/csr.c        | 879 ++++++++++++++++++++++++++++----------
> target/riscv/machine.c    |  25 ++
> target/riscv/meson.build  |   3 +-
> target/riscv/pmu.c        | 432 +++++++++++++++++++
> target/riscv/pmu.h        |  36 ++
> 10 files changed, 1315 insertions(+), 235 deletions(-)
> create mode 100644 target/riscv/pmu.c
> create mode 100644 target/riscv/pmu.h
>
> --
> 2.25.1
>
>

Any comments on this series ? Kernel patches have been merged. It
would be good to have qemu in upstream as well.


-- 
Regards,
Atish



reply via email to

[Prev in Thread] Current Thread [Next in Thread]