qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU I


From: Alex Bennée
Subject: Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
Date: Mon, 29 Jun 2020 17:03:30 +0100
User-agent: mu4e 1.5.3; emacs 28.0.50

Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:

> Hi,
>
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
>
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.

A couple of comments. One think I think is missing from your analysis is
the total number of guest instructions being emulated. As you point out
each guest will have different code efficiency in terms of it's
generated code.

Assuming your test case is constant execution (i.e. runs the same each
time) you could run in through a plugins build to extract the number of
guest instructions, e.g.:

  ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin 
./tests/tcg/aarch64-linux-user/sha1
  SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
  insns: 158603512

I should have also pointed out in your last report that running FP heavy
code will always be biased towards helper/softfloat code to the
detriment of everything else. I think you need more of a mix of
benchmarks to get a better view.

When Emilio did the last set of analysis he used a suite he built out of
nbench and a perl benchmark:

  https://github.com/cota/dbt-bench

As he quoted in his README:

  NBench programs are small, with execution time dominated by small code
  loops. Thus, when run under a DBT engine, the resulting performance
  depends almost entirely on the quality of the output code.

  The Perl benchmarks compile Perl code. As is common for compilation
  workloads, they execute large amounts of code and show no particular
  code execution hotspots. Thus, the resulting DBT performance depends
  largely on code translation speed.
  
by only having one benchmark you are going to miss out on the envelope
of use cases.

>
> Report link:
>https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>
> Best regards,
> Ahmed Karaman


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]