qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU I


From: Ahmed Karaman
Subject: Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
Date: Mon, 29 Jun 2020 23:16:27 +0200

On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
>
> > Hi,
> >
> > The second report of the TCG Continuous Benchmarking series builds
> > upon the QEMU performance metrics calculated in the previous report.
> > This report presents a method to dissect the number of instructions
> > executed by a QEMU invocation into three main phases:
> > - Code Generation
> > - JIT Execution
> > - Helpers Execution
> > It devises a Python script that automates this process.
> >
> > After that, the report presents an experiment for comparing the
> > output of running the script on 17 different targets. Many conclusions
> > can be drawn from the results and two of them are discussed in the
> > analysis section.
>
> A couple of comments. One think I think is missing from your analysis is
> the total number of guest instructions being emulated. As you point out
> each guest will have different code efficiency in terms of it's
> generated code.
>
> Assuming your test case is constant execution (i.e. runs the same each
> time)
Yes indeed, the report utilizes Callgrind in the measurements so the
results are very stable.
>you could run in through a plugins build to extract the number of
> guest instructions, e.g.:
>
>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin 
> ./tests/tcg/aarch64-linux-user/sha1
>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
>   insns: 158603512
>
That's a very nice suggestion. Maybe this will be the idea of a whole
new report. I'll try to execute the provided command and will let you
know if I have any questions.
> I should have also pointed out in your last report that running FP heavy
> code will always be biased towards helper/softfloat code to the
> detriment of everything else. I think you need more of a mix of
> benchmarks to get a better view.
>
> When Emilio did the last set of analysis he used a suite he built out of
> nbench and a perl benchmark:
>
>   https://github.com/cota/dbt-bench
>
> As he quoted in his README:
>
>   NBench programs are small, with execution time dominated by small code
>   loops. Thus, when run under a DBT engine, the resulting performance
>   depends almost entirely on the quality of the output code.
>
>   The Perl benchmarks compile Perl code. As is common for compilation
>   workloads, they execute large amounts of code and show no particular
>   code execution hotspots. Thus, the resulting DBT performance depends
>   largely on code translation speed.
>
> by only having one benchmark you are going to miss out on the envelope
> of use cases.
>
Future reports will introduce a variety of benchmarks. This report -
and the previous one - are introductory reports. The benchmark used
was to only demonstrate the report ideas. It was not used as a strict
benchmarking program.
> >
> > Report link:
> >https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> >
> > Best regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]