qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-


From: Alex Bennée
Subject: Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison
Date: Fri, 10 Jul 2020 11:11:57 +0100
User-agent: mu4e 1.5.4; emacs 28.0.50

Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:

> On Thu, Jul 9, 2020 at 4:41 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>>
>> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
>>
>> > Hi,
>> >
>> > The third report of the TCG Continuous Benchmarking series utilizes
>> > the tools presented in the previous report for comparing the
>> > performance of 17 different targets across two versions of QEMU. The
>> > two versions addressed are 5.0 and 5.1-pre-soft-freeze (current state
>> > of QEMU).
>> >
>> > After summarizing the results, the report utilizes the KCachegrind
>> > tool and dives into the analysis of why all three PowerPC targets
>> > (ppc, ppc64, ppc64le) had a performance degradation between the two
>> > QEMU versions.
>>
>> It's an interesting degradation especially as you would think that a
>> change in the softfloat implementation should hit everyone in the same
>> way.
>>
>
> That's the same that I've thought of, but while working on next week's
> report, it appears that this specific change introduced a performance
> improvement in other targets!
>
>> We actually have a tool for benchmarking the softfloat implementation
>> itself called fp-bench. You can find it in tests/fp. I would be curious
>> to see if you saw a drop in performance in the following:
>>
>>   ./fp-bench -p double -o cmp
>>
>
> I ran the command before and after the commit introducing the
> degradation. Both runs gave results varying between 600~605 MFlops.
> Running with Callgrind and the Coulomb benchmark, the results were:
> Number of instructions before: 12,715,390,413
> Number of isntructions after: 13,031,104,137

You may have to average over several runs to see if there is a
detectable change. It could be although there are more instructions
being executed it makes no practical difference to the execution because
the processor is just as efficient in scheduling the work to the
execution units.

You have to remember on modern processors the relationship between
instructions and the utilisation of the eventual ALUs is tenuous at
best. After everything has been converted to uOps and scheduled you
might be doing broadly the same calculations. Pipeline and cache stalls
are probably a more important metric here although I doubt figure much
in the very tight loop of the benchmark.

>
>> >
>> > Report link:
>> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-5.0-and-5.1-pre-soft-freeze-Dissect-Comparison/
>>
>> If you identify a drop in performance due to a commit linking to it from
>> the report wouldn't be a bad idea so those that want to quickly
>> replicate the test can do before/after runs.
>>
>
> Report number 5 will introduce a new tool for detecting commits
> causing performance improvements and degradations. The report will
> utilize this tool to find out the specific commit introducing these
> changes.

Excellent - keep up the good work ;-)

>
>> >
>> > Previous reports:
>> > Report 1 - Measuring Basic Performance Metrics of QEMU:
>> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>> > Report 2 - Dissecting QEMU Into Three Main Parts:
>> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
>> >
>> > Best regards,
>> > Ahmed Karaman
>>
>>
>> --
>> Alex Bennée
>
> Best regards,
> Ahmed Karaman


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]