qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed


From: Stefan Hajnoczi
Subject: Re: [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed
Date: Wed, 26 Jan 2022 11:21:31 +0000

On Tue, Jan 18, 2022 at 11:27:35AM -0500, Emanuele Giuseppe Esposito wrote:
> If a drain happens while a job is sleeping, the timeout
> gets cancelled and the job continues once the drain ends.
> This is especially bad for the sleep performed in commit and stream
> jobs, since that is dictated by ratelimit to maintain a certain speed.
> 
> Basically the execution path is the followig:

s/followig/following/

> 1. job calls job_sleep_ns, and yield with a timer in @ns ns.
> 2. meanwhile, a drain is executed, and
>    child_job_drained_{begin/end} could be executed as ->drained_begin()
>    and ->drained_end() callbacks.
>    Therefore child_job_drained_begin() enters the job, that continues
>    execution in job_sleep_ns() and calls job_pause_point_locked().
> 3. job_pause_point_locked() detects that we are in the middle of a
>    drain, and firstly deletes any existing timer and then yields again,
>    waiting for ->drained_end().
> 4. Once draining is finished, child_job_drained_end() runs and resumes
>    the job. At this point, the timer has been lost and we just resume
>    without checking if enough time has passed.
> 
> This fix implies that from now onwards, job_sleep_ns will force the job
> to sleep @ns, even if it is wake up (purposefully or not) in the middle
> of the sleep. Therefore qemu-iotests test might run a little bit slower,
> depending on the speed of the job. Setting a job speed to values like "1"
> is not allowed anymore (unless you want to wait forever).
> 
> Because of this fix, test_stream_parallel() in tests/qemu-iotests/030
> takes too long, since speed of stream job is just 1024 and before
> it was skipping all the wait thanks to the drains. Increase the
> speed to 256 * 1024. Exactly the same happens for test 151.
> 
> Instead we need to sleep less in test_cancel_ready() test-blockjob.c,
> so that the job will be able to exit the sleep and transition to ready
> before the main loop asserts.

I remember seeing Hanna and Kevin use carefully rate-limited jobs in
qemu-iotests. They might have thoughts on whether this patch is
acceptable or not.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]