qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster


From: Alberto Garcia
Subject: Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster
Date: Wed, 19 Aug 2020 17:37:12 +0200
User-agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu)

On Wed 19 Aug 2020 05:07:11 PM CEST, Kevin Wolf wrote:
>> I checked with xfs on my computer. I'm not very familiar with that
>> filesystem so I was using the default options and I didn't tune
>> anything.
>> 
>> What I got with my tests (using fio):
>> 
>> - Using extent_size_hint didn't make any difference in my test case (I
>>   do see a clear difference however with the test case described in
>>   commit ffa244c84a).
>
> Hm, interesting. What is your exact fio configuration? Specifically,
> which iodepth are you using? I guess with a low iodepth (and O_DIRECT),
> the effect of draining the queue might not be as visible.

fio --filename=/dev/vdb --direct=1 --randrepeat=1 --eta=always
    --ioengine=libaio --iodepth=32 --numjobs=1 --name=test --size=25G
    --io_limit=25G --ramp_time=5 --rw=randwrite --bs=4k --runtime=60

>> - preallocation=off is still faster than preallocation=metadata.
>
> Brian, can you help us here with some input?
>
> Essentially what we're having here is a sparse image file on XFS that
> is opened with O_DIRECT (presumably - Berto, is this right?), and
> Berto is seeing cases where a random write benchmark is faster if
> we're doing the 64k ZERO_RANGE + 4k pwrite when touching a 64k cluster
> for the first time compared to always just doing the 4k pwrite. This
> is with a 1 MB extent size hint.

A couple of notes:

- Yes, it's O_DIRECT (the image is opened with cache=none and fio uses
  --direct=1).

- The extent size hint is the default one, I didn't change or set
  anything for this test (or should I have?).

> From the discussions we had the other day [1][2] I took away that your
> suggestion is that we should not try to optimise things with
> fallocate(), but just write the areas we really want to write and let
> the filesystem deal with the sparse parts. Especially with the extent
> size hint that we're now setting, I'm surprised to hear that doing a
> ZERO_RANGE first still seems to improve the performance.
>
> Do you have any idea why this is happening and what we should be doing
> with this?
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1850660
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1666864
>
>>   If I disable handle_alloc_space() (so there is no ZERO_RANGE used)
>>   then it is much slower.
>
> This makes some sense because then we're falling back to writing
> explicit zero buffers (unless you disabled that, too).

Exactly, this happens on both ext4 and xfs.

>> - With preallocation=falloc I get the same results as with
>>   preallocation=metadata.
>
> Interesting, this means that the fallocate() call costs you basically
> no time. I would have expected preallocation=falloc to be a little
> faster.

I would expect preallocation=falloc to be at least as fast as
preallocation=off (and it is, on ext4). However on xfs it seems to be
slower (?). It doesn't make sense to me.

>> - preallocation=full is the fastest by far.
>
> I guess this saves the conversion of unwritten extents to fully
> allocated ones?

However it is *much* *much* faster. I assume I must be missing something
on how the filesystem works.

I ran the test again on a newly created filesystem just to make sure,
here are the full results (numbers are IOPS):

|----------------------+-------+-------|
| preallocation        |  ext4 |   xfs |
|----------------------+-------+-------|
| off                  | 11688 |  6981 |
| off (w/o ZERO_RANGE) |  2780 |  3196 |
| metadata             |  9132 |  5764 |
| falloc               | 13108 |  5727 |
| full                 | 16351 | 40759 |
|----------------------+-------+-------|

Berto



reply via email to

[Prev in Thread] Current Thread [Next in Thread]