qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] nbd/client: Request larger block status by default


From: Eric Blake
Subject: Re: [PATCH] nbd/client: Request larger block status by default
Date: Tue, 21 Sep 2021 15:00:17 -0500
User-agent: NeoMutt/20210205-772-2b4c52

On Tue, Sep 21, 2021 at 10:12:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 21.09.2021 21:08, Eric Blake wrote:
> > On Tue, Sep 21, 2021 at 08:25:11PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > 21.09.2021 19:17, Eric Blake wrote:
> > > > Now that commit 5a1cfd21 has clarified that a driver's block_status
> > > > can report larger *pnum than in the original request, we can take
> > > > advantage of that in the NBD driver.  Rather that limiting our request
> > > > to the server based on the maximum @bytes our caller mentioned, we
> > > > instead ask for as much status as possible (the minimum of our 4G
> > > > limit or the rest of the export); the server will still only give us
> > > > one extent in its answer (because we are using NBD_CMD_FLAG_REQ_ONE),
> > > > but now the block layer's caching of data areas can take advantage of
> > > > cases where the server gives us a large answer to avoid the need for
> > > > future NBD_CMD_BLOCK_STATUS calls.
> > > > 
> > > > Signed-off-by: Eric Blake <eblake@redhat.com>
> > > > ---
> > 
> > > 
> > > I remember we already discussed that, but can't find.
> > > 
> > > The problem is that it's not for free:
> > > 
> > > In server code in blockstatus_to_extents, we loop though the disk, trying 
> > > to merge extents of the same type.
> > > 
> > > With full allocated qcow2, we'll have to load all L2 tables and handle 
> > > them, to merge all block status into one big "allocated" extent.
> > > 
> > 
> > We don't have to loop that far.  The NBD protocol allows the server to
> > stop looping at whatever point makes sense, as long as it makes
> > progress.
> > 
> > > Maybe, we need some additional negotiation flag, to allow BLOCK_STATUS 
> > > command with NBD_CMD_FLAG_REQ_ONE flag to return an extent larger than 
> > > required when that information is available for free?

That's already the case when FLAG_REQ_ONE is not present.  The reason
that REQ_ONE clamps things at the requested limit is because older
qemu had a bug that it rejected the server sending extra information,
even when that info was free.

> > 
> > That's one possibility.  Another does not add anything to the NBD
> > protocol, but instead limits the code that tries to loop over block
> > status to deteremine a larger "allocated" answer to return to instead
> > stop looping after a finite number of extents have been merged
> > together.
> > 
> 
> In this case we should answer a question: when to stop looping? I'm not sure 
> we can simply drop the loop:
> 
> For example, for compressed clusters, bdrv_co_block_status() will return them 
> one-by-one, and sending them one by one to the wire, when user requested 
> large range would be inefficient.
> Or should we change block-status behavior for compressed clusters? And may be 
> add flag to block_status() that we are not interested in valid_offset, so it 
> can return an extent corresponding to the whole L2 table chunk (if all 
> entries are allocated, but not consecutive)?

Currently, bdrv_co_block_status() takes 'bool want_zero' that says
what the client wants.  Maybe it's worth expanding that into an enum
or bitmask to allow finer-grained client requests (the notion of
whether valid_offset matters to the caller IS relevant for deciding
when to clamp vs. loop).

> 
> 
> Hmm. So, if not update spec, we'll have to "fix" implementation. That means 
> actually, that we should update spec anyway, at least to note that: "clients 
> tend to request large regions in hope that server will not spend too much 
> time to serve them but instead return shorter answer"..

I'm really hoping we don't have to tweak the NBD spec on this one, but
rather improve the quality of implementation in qemu.

> 
> And you'll never have guarantee, that some another (non-qemu) NBD server will 
> not try to satisfy the whole request in on go.

That's true, but the NBD spec has always tried to encourage servers to
provide more information when it was free, but to give up early if it
gets too expensive.  It's a judgment call on where that line lies, and
may indeed be different between different servers.

> 
> 
> In other words:
> 
> 1. We want block_status of some region
> 2. If there some free information available about larger region we are happy 
> to cache it
> 
> With your solution, we just request a lot larger region, so we lose 
> information of [1]. That means that sever can't imagine, how much of 
> requested region is really needed, i.e. if we do some additional work to 
> return more information (still within boundaries of the request) will it be:
>  - good work to minimize network traffic
> OR
>  - extra work, waste server time, client will cache this information but 
> probably never use (or even lose it soon, as our cache is very simple)
> 
> With additional negotiation flag we don't lose [1], i.e how much client wants 
> now.
> 
> 
> So, for me, modifying the protocol looks nicer..
> 
> Another approach is do request without NBD_CMD_FLAG_REQ_ONE and handle 
> several extents.

_This_ idea is nicer.  It allows the client to request an actual
length it is interested in now, but allows the server to give extra
information back if it is free.  And it works without changing the NBD
protocol or existing qemu server; it is a client-only change, just
like this patch tried to be, but may have nicer performance
implications.

> 
> 
> Are you optimizing some concrete scenario?

Not at this point, so much as observing the effects of commit 5a1cfd21
and seeing if we should update our behavior to match.

For v2, I'll try switching to just drop our REQ_ONE artificial
limitations from the client.  We are still throwing away a lot of
useful information because we don't cache anything beyond the first
extent returned by the server.  At worst, maybe it will require adding
a tuning knob in the QAPI when creating the NBD client to decide which
of the two approaches to favor for a given client's connection to a
particular server.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]