qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for a regular upstream performance testing


From: Lukáš Doktor
Subject: Re: Proposal for a regular upstream performance testing
Date: Mon, 28 Mar 2022 08:21:54 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

Well, I seem to have forgotten to attach the report, sorry about that...

Regards,
Lukáš

Dne 28. 03. 22 v 8:18 Lukáš Doktor napsal(a):
> Hello Stefan, folks,
> 
> I seem to have another hit, an improvement actually and it seems to be 
> bisected all the way to you, Stefan. Let me use this as another example of 
> how such process could look like and we can use this to hammer-out the 
> details like via what means to submit the request, whom to notify and how to 
> proceed further.
> 
> ---
> 
> Last week I noticed an improvement in 
> TunedLibvirt/fio-rot-Aj-8i/0000:./write-4KiB/throughput/iops_sec.mean 
> (<driver name="qemu" type="raw" io="native" cache="none"/>, fio, rotationary 
> disk, raw file on host xfs partition, jobs=#cpus, iodepth=8, 4k writes) check 
> and bisected it to:
> 
> commit fc8796465c6cd4091efe6a2f8b353f07324f49c7
> Author: Stefan Hajnoczi <stefanha@redhat.com>
> Date:   Wed Feb 23 15:57:03 2022 +0000
> 
>     aio-posix: fix spurious ->poll_ready() callbacks in main loop
> 
> Could you please confirm that it does make sense and that it is expected? 
> (looks like it from the description).
> 
> Note that this commit was pin pointed using 2 out of 3 commits result, there 
> were actually some little differences between commits fc8 and cc5. The fc8 
> and 202 results scored similarly to both, good and bad commits with 2 being 
> closer to the bad one. Since cc5 they seem to stabilize further scoring 
> slightly lower than the median fc8 result. Anyway I don't have enough data to 
> declare anything. I can bisect it further if needed.
> 
> The git bisect log:
> 
> git bisect start
> # good: [ecf1bbe3227cc1c54d7374aa737e7e0e60ee0c29] Merge tag 
> 'pull-ppc-20220321' of https://github.com/legoater/qemu into staging
> git bisect good ecf1bbe3227cc1c54d7374aa737e7e0e60ee0c29
> # bad: [9d36d5f7e0dc905d8cb3dd437e479eb536417d3b] Merge tag 
> 'pull-block-2022-03-22' of https://gitlab.com/hreitz/qemu into staging
> git bisect bad 9d36d5f7e0dc905d8cb3dd437e479eb536417d3b
> # bad: [0f7d7d72aa99c8e48bbbf37262a9c66c83113f76] iotests: use 
> qemu_img_json() when applicable
> git bisect bad 0f7d7d72aa99c8e48bbbf37262a9c66c83113f76
> # bad: [cc5387a544325c26dcf124ac7d3999389c24e5c6] block/rbd: fix write zeroes 
> with growing images
> git bisect bad cc5387a544325c26dcf124ac7d3999389c24e5c6
> # good: [b21e2380376c470900fcadf47507f4d5ade75e85] Use g_new() & friends 
> where that makes obvious sense
> git bisect good b21e2380376c470900fcadf47507f4d5ade75e85
> # bad: [2028ab513bf0232841a909e1368309858919dbcc] Merge tag 
> 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging
> git bisect bad 2028ab513bf0232841a909e1368309858919dbcc
> # bad: [fc8796465c6cd4091efe6a2f8b353f07324f49c7] aio-posix: fix spurious 
> ->poll_ready() callbacks in main loop
> git bisect bad fc8796465c6cd4091efe6a2f8b353f07324f49c7
> # good: [8a947c7a586e16a048894e1a0a73d154435e90ef] aio-posix: fix build 
> failure io_uring 2.2
> git bisect good 8a947c7a586e16a048894e1a0a73d154435e90ef
> # first bad commit: [fc8796465c6cd4091efe6a2f8b353f07324f49c7] aio-posix: fix 
> spurious ->poll_ready() callbacks in main loop
> 
> Also please find the bisection report attached. I can attach the VM xml file 
> or other logs if needed.
> 
> Regards,
> Lukáš
> 
> 
> Dne 22. 03. 22 v 16:05 Stefan Hajnoczi napsal(a):
>> On Mon, Mar 21, 2022 at 11:29:42AM +0100, Lukáš Doktor wrote:
>>> Hello Stefan,
>>>
>>> Dne 21. 03. 22 v 10:42 Stefan Hajnoczi napsal(a):
>>>> On Mon, Mar 21, 2022 at 09:46:12AM +0100, Lukáš Doktor wrote:
>>>>> Dear qemu developers,
>>>>>
>>>>> you might remember the "replied to" email from a bit over year ago to 
>>>>> raise a discussion about a qemu performance regression CI. On KVM forum I 
>>>>> presented 
>>>>> https://www.youtube.com/watch?v=Cbm3o4ACE3Y&list=PLbzoR-pLrL6q4ZzA4VRpy42Ua4-D2xHUR&index=9
>>>>>  some details about my testing pipeline. I think it's stable enough to 
>>>>> become part of the official CI so people can consume, rely on it and 
>>>>> hopefully even suggest configuration changes.
>>>>>
>>>>> The CI consists of:
>>>>>
>>>>> 1. Jenkins pipeline(s) - internal, not available to developers, running 
>>>>> daily builds of the latest available commit
>>>>> 2. Publicly available anonymized results: 
>>>>> https://ldoktor.github.io/tmp/RedHat-Perf-worker1/
>>>>
>>>> This link is 404.
>>>>
>>>
>>> My mistake, it works well without the tailing slash: 
>>> https://ldoktor.github.io/tmp/RedHat-Perf-worker1
>>>
>>>>> 3. (optional) a manual gitlab pulling job which triggered by the Jenkins 
>>>>> pipeline when that particular commit is checked
>>>>>
>>>>> The (1) is described here: 
>>>>> https://run-perf.readthedocs.io/en/latest/jenkins.html and can be 
>>>>> replicated on other premises and the individual jobs can be executed 
>>>>> directly https://run-perf.readthedocs.io on any linux box using Fedora 
>>>>> guests (via pip or container 
>>>>> https://run-perf.readthedocs.io/en/latest/container.html ).
>>>>>
>>>>> As for the (3) I made a testing pipeline available here: 
>>>>> https://gitlab.com/ldoktor/qemu/-/pipelines with one always-passing test 
>>>>> and one allow-to-fail actual testing job. If you think such integration 
>>>>> would be useful, I can add it as another job to the official qemu repo. 
>>>>> Note the integration is a bit hacky as, due to resources, we can not test 
>>>>> all commits but rather test on daily basis, which is not officially 
>>>>> supported by gitlab.
>>>>>
>>>>> Note the aim of this project is to ensure some very basic system-level 
>>>>> workflow performance stays the same or that the differences are described 
>>>>> and ideally pinned to individual commits. It should not replace thorough 
>>>>> release testing or low-level performance tests.
>>>>
>>>> If I understand correctly the GitLab CI integration you described
>>>> follows the "push" model where Jenkins (running on your own machine)
>>>> triggers a manual job in GitLab CI simply to indicate the status of the
>>>> nightly performance regression test?
>>>>
>>>> What process should QEMU follow to handle performance regressions
>>>> identified by your job? In other words, which stakeholders need to
>>>> triage, notify, debug, etc when a regression is identified?
>>>>
>>>> My guess is:
>>>> - Someone (you or the qemu.git committer) need to watch the job status and 
>>>> triage failures.
>>>> - That person then notifies likely authors of suspected commits so they 
>>>> can investigate.
>>>> - The authors need a way to reproduce the issue - either locally or by 
>>>> pushing commits to GitLab and waiting for test results.
>>>> - Fixes will be merged as additional qemu.git commits since commit history 
>>>> cannot be rewritten.
>>>> - If necessary a git-revert(1) commit can be merged to temporarily undo a 
>>>> commit that caused issues.
>>>>
>>>> Who will watch the job status and triage failures?
>>>>
>>>> Stefan
>>>
>>> This is exactly the main question I'd like to resolve as part of 
>>> considering-this-to-be-official-part-of-the-upstream-qemu-testing. At this 
>>> point our team is offering it's service to maintain this single worker for 
>>> daily jobs, monitoring the status and pinging people in case of bisectable 
>>> results.
>>
>> That's great! The main hurdle is finding someone to triage regressions
>> and if you are volunteering to do that then these regression tests would
>> be helpful to QEMU.
>>
>>> From the upstream qemu community we are mainly looking for a feedback:
>>>
>>> * whether they'd want to be notified of such issues (and via what means)
>>
>> I have CCed Kevin Wolf in case he has any questions regarding how fio
>> regressions will be handled.
>>
>> I'm happy to be contacted when a regression bisects to a commit I
>> authored.
>>
>>> * whether the current approach seems to be actually performing useful tasks
>>> * whether the reports are understandable
>>
>> Reports aren't something I would look at as a developer. Although the
>> history and current status may be useful to some maintainers, that
>> information isn't critical. Developers simply need to know which commit
>> introduced a regression and the details of how to run the regression.
>>
>>> * whether the reports should be regularly pushed into publicly available 
>>> place (or just on regression/improvement)
>>> * whether there are any volunteers to be interested in 
>>> non-clearly-bisectable issues (probably by-topic)
>>
>> One option is to notify maintainers, but when I'm in this position
>> myself I usually only investigate critical issues due to limited time.
>>
>> Regarding how to contact people, I suggest emailing them and CCing
>> qemu-devel so others are aware.
>>
>> Thanks,
>> Stefan

<<< text/html; charset=UTF-8; name="report.html": Unrecognized >>>

Attachment: OpenPGP_0x26B362E47FCF22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]