[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/2] qemu-img: align next status sector on destination alignm
From: |
Maxim Levitsky |
Subject: |
Re: [PATCH 2/2] qemu-img: align next status sector on destination alignment. |
Date: |
Thu, 18 Mar 2021 11:57:07 +0200 |
User-agent: |
Evolution 3.36.5 (3.36.5-2.fc32) |
On Thu, 2020-11-12 at 17:04 +0200, Maxim Levitsky wrote:
> On Thu, 2020-11-12 at 07:45 -0600, Eric Blake wrote:
> > On 11/12/20 6:40 AM, Peter Lieven wrote:
> >
> > > > /*
> > > > - * Avoid that s->sector_next_status becomes unaligned to the
> > > > source
> > > > - * request alignment and/or cluster size to avoid unnecessary
> > > > read
> > > > - * cycles.
> > > > + * Avoid that s->sector_next_status becomes unaligned to the
> > > > + * source/destination request alignment and/or cluster size to
> > > > avoid
> > > > + * unnecessary read/write cycles.
> > > > */
> > > > - tail = (sector_num - src_cur_offset + n) %
> > > > s->src_alignment[src_cur];
> > > > + alignment = MAX(s->src_alignment[src_cur], s->alignment);
> > > > + assert(is_power_of_2(alignment));
> > > > +
> > > > + tail = (sector_num - src_cur_offset + n) % alignment;
> > > > if (n > tail) {
> > > > n -= tail;
> > > > }
> > >
> > > I was also considering including the s->alignment when adding this
> > > chance. However, you need the least common multiple of both alignments,
> > > not the maximum, otherwise
> > >
> > > you might get misaligned to either source or destination.
> > >
> > >
> > > Why exactly do you need the power of two requirement?
> >
> > The power of two requirement ensures that you h ave the least common
> > multiple of both alignments ;)
> -
> Yes, and in addition to that both alignments are already power of two because
> we align them
> to it. The assert I added is just in case.
>
> This is how we calculate the destination alignment:
>
> s.alignment = MAX(pow2floor(s.min_sparse),
> DIV_ROUND_UP(out_bs->bl.request_alignment,
> BDRV_SECTOR_SIZE));
>
>
> This is how we calculate the source alignments (it is possible to have
> several source files)
>
> s.src_alignment[bs_i] = DIV_ROUND_UP(src_bs->bl.request_alignment,
> BDRV_SECTOR_SIZE);
> if (!bdrv_get_info(src_bs, &bdi)) {
> s.src_alignment[bs_i] = MAX(s.src_alignment[bs_i], bdi.cluster_size /
> BDRV_SECTOR_SIZE);
> }
>
>
> The bl.request_alignment comment mentions that it must be power of 2,
> and cluster sizes are also very likely to be power of 2 in all drivers
> we support. An assert for s.src_alignment won't hurt though.
>
>
> Note though that we don't really read the discard alignment.
> We have max_pdiscard, and pdiscard_alignment, but max_pdiscard
> is only used to split 'too large' discard requests, and pdiscard_alignment
> is advisory and used only in a couple of places.
> Neither are set by file-posix.
>
> We do have request_alignment, which file-posix tries to align to the logical
> block
> size which is still often 512 for backward compatibility on many devices
> (even nvme)
>
> Now both source and destination alignments in qemu-img convert are based on
> request_aligment
> and on min_sparse (which is by default 4K, controlled by qemu-img -S parameter
> (which otherwise controls the minimum number of blocks to be zero, to consider
> discarding them in convert)
>
>
> This means that there is no guarantee to have 4K alignment, and besides,
> with sufficiently fragmented source file, the bdrv_block_status_above
> can return arbitrary short and unaligned extents which can't be aligned,
> thus as I said this patch alone can't guarantee that we won't have
> write and discard sharing the same page.
>
> Another thing that can be discussed is why is_allocated_sectors was patched
> to convert short discards to writes. Probably because underlying hardware
> ignores them or something? In theory we should detect this and fail
> back to regular zeroing in this case.
> Again though, while in case of converting an empty file, this is the only
> source of writes, and eliminating it, also 'fixes' this case, with
> sufficiently
> fragmented source file we can even without this code get a write and discard
> sharing a page.
>
>
> Best regards,
> Maxim Levitsky
>
> > However, there ARE devices in practice that have advertised a
> > non-power-of-2 discard granularity, such as 15MiB (see commits 430b26a8,
> > 63188c24). Which means you probably don't want to assert power-of-2,
> > and in turn need to worry about least common multiple.
> >
Any update on this patch?
Best regards,
Maxim Levitsky
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH 2/2] qemu-img: align next status sector on destination alignment.,
Maxim Levitsky <=