[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block

From: Nir Soffer
Subject: Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block
Date: Sat, 17 Aug 2019 01:45:14 +0300

On Sat, Aug 17, 2019 at 12:57 AM John Snow <address@hidden> wrote:
On 8/16/19 5:21 PM, Nir Soffer wrote:
> When creating an image with preallocation "off" or "falloc", the first
> block of the image is typically not allocated. When using Gluster
> storage backed by XFS filesystem, reading this block using direct I/O
> succeeds regardless of request length, fooling alignment detection.
> In this case we fallback to a safe value (4096) instead of the optimal
> value (512), which may lead to unneeded data copying when aligning
> requests.  Allocating the first block avoids the fallback.

Where does this detection/fallback happen? (Can it be improved?)

In raw_probe_alignment().

This patch explain the issues:

Here Kevin and me discussed ways to improve it:

> When using preallocation=off, we always allocate at least one filesystem
> block:
>     $ ./qemu-img create -f raw test.raw 1g
>     Formatting 'test.raw', fmt=raw size=1073741824
>     $ ls -lhs test.raw
>     4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> I did quick performance tests for these flows:
> - Provisioning a VM with a new raw image.
> - Copying disks with qemu-img convert to new raw target image
> I installed Fedora 29 server on raw sparse image, measuring the time
> from clicking "Begin installation" until the "Reboot" button appears:
> Before(s)  After(s)     Diff(%)
> -------------------------------
>      356        389        +8.4
> I ran this only once, so we cannot tell much from these results.

That seems like a pretty big difference for just having pre-allocated a
single block. What was the actual command line / block graph for that test?

Having the first block allocated changes the alignment.

Before this patch, we detect request_alignment=1, so we fallback to 4096.
Then we detect buf_align=1, so we fallback to value of request alignment.

The guest see a disk with:
logical_block_size = 512
physical_block_size = 512

But qemu uses:
request_alignment = 4096
buf_align = 4096

storage uses:
logical_block_size = 512
physical_block_size = 512

If the guest does direct I/O using 512 bytes aligment, qemu has to copy
the buffer to align them to 4096 bytes.

After this patch, qemu detects the alignment correctly, so we have:

logical_block_size = 512
physical_block_size = 512

request_alignment = 512
buf_align = 512

logical_block_size = 512
physical_block_size = 512

We expect this to be more efficient because qemu does not have to emulate

Was this over a network that could explain the variance?

Maybe, this is complete install of Fedora 29 server, I'm not sure if the installation 
access the network.

> The second test was cloning the installation image with qemu-img
> convert, doing 10 runs:
>     for i in $(seq 10); do
>         rm -f dst.raw
>         sleep 10
>         time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
>     done
> Here is a table comparing the total time spent:
> Type    Before(s)   After(s)    Diff(%)
> ---------------------------------------
> real      530.028    469.123      -11.4
> user       17.204     10.768      -37.4
> sys        17.881      7.011      -60.7
> Here we see very clear improvement in CPU usage.

Hard to argue much with that. I feel a little strange trying to force
the allocation of the first block, but I suppose in practice "almost no
preallocation" is indistinguishable from "exactly no preallocation" if
you squint.


The real issue is that filesystems and block devices do not expose the alignment
requirement for direct I/O, so we need to use these hacks and assumptions.

With local XFS we use xfsctl(XFS_IOC_DIOINFO) to get request_alignment, but this does
not help for XFS filesystem used by Gluster on the server side.

I hope that Niels is working on adding similar ioctl for Glsuter, os it can expose the properties
of the remote filesystem.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]