[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock) |
Date: |
Tue, 30 Mar 2021 21:33:17 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 |
30.03.2021 19:39, Max Reitz wrote:
On 30.03.21 15:25, Vladimir Sementsov-Ogievskiy wrote:
30.03.2021 15:51, Max Reitz wrote:
On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:
30.03.2021 12:49, Max Reitz wrote:
On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:
ping. Do we want it for 6.0?
I’d rather wait. I think the conclusion was that guests shouldn’t hit this
because they serialize discards?
I think, that we never had bugs, so we of course can wait.
There’s also something Kevin wrote on IRC a couple of weeks ago, for which I
had hoped he’d sent an email but I don’t think he did, so I’ll try to remember
and paraphrase as well as I can...
He basically asked whether it wouldn’t be conceptually simpler to take a
reference to some cluster in get_cluster_offset() and later release it with a
to-be-added put_cluster_offset().
He also noted that reading is problematic, too, because if you read a discarded
and reused cluster, this might result in an information leak (some guest
application might be able to read data it isn’t allowed to read); that’s why
making get_cluster_offset() the point of locking clusters against discarding
would be better.
Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 0/6] qcow2:
fix parallel rewrite and discard (lockless))
This would probably work with both of your solutions. For the in-memory
solutions, you’d take a refcount to an actual cluster; in the CoRwLock
solution, you’d take that lock.
What do you think?
Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and
qcow2_inflight_writes_dec() to get_cluster_offset()/put_cluster_offset(), to
make it more native to use for read operations as well?
Hm. Our discussion wasn’t so detailed.
I interpreted it to mean all qcow2 functions that find an offset to a qcow2
cluster, namely qcow2_get_host_offset(), qcow2_alloc_host_offset(), and
qcow2_alloc_compressed_cluster_offset().
What about qcow2_alloc_clusters() ?
Seems like all callers for that but do_alloc_cluster_offset() call it to
allocate metadata clusters, which cannot be discarded from the guest.
Is it really possible that some metadata cluster is used while qcow2 discards
it internally at the same time, or isn’t all of this only a problem for data
clusters?
It's a good question which I think doesn't have fast answer.
I think:
1. On normal code paths we usually handle qcow2 metadata always under s->lock
mutex. So, we can't decrease refcount to zero in parallel, as update_refcount()
should be called under s->lock too.
2. [1] is violated at least by code paths that are not in coroutine (for example
qcow2-snapshots). But seems that on these paths we are guarded by other things
(like drained section).. For sure we should move the whole qcow2 driver to
coroutine and do every metadata update under s->lock.
3. Does [1] violated on some coroutine paths I don't know. But I hope that it
doesn't, otherwise we should have bugs..
In future we'll probably want to work with metadata without s->lock, at least for
IO. Then we'll need same mechanisms like we are now inventing for data writes.. But
I'm not sure that for metadata it will be so simple (as for now mutex prevents not
only parallel write & discard of metadata, but also parallel updates. (And for
parallel guest writes to same cluster we don't care, it's a problem of the guest)
When those functions return an offset (in)to some cluster, that cluster (or the
image as a whole) should be locked against discards. Every offset received this
way would require an accompanying qcow2_put_host_offset().
Or to update any kind of "getting cluster offset" in the whole qcow2 driver to take a
kind of "dynamic reference count" by get_cluster_offset() and then call corresponding
put() somewhere? In this case I'm afraid it's a lot more work..
Hm, really? I would have assumed we need to do some locking in all functions
that get a cluster offset this way, so it should be less work to take the lock
in the functions they invoke to get the offset.
It would be also the problem that a lot of paths in qcow2 are not in coroutine and
don't even take s->lock when they actually should.
I’m not sure what you mean here, because all functions that invoke any of the
three functions I listed above are coroutine_fns (or, well, I didn’t look it
up, but they all have *_co_* in their name).
qcow2_alloc_clusters() has a lot more callers..
Let’s hope we don’t need to worry about it then. O:)
This will also mean that we do same job as normal qcow2 refcounts already do: no sense in
keeping additional "dynamic refcount" for L2 table cluster while reading it, as
we already have non-zero qcow2 normal refcount for it..
I’m afraid I don’t understand how normal refcounts relate to this. For example,
qcow2_get_host_offset() doesn’t touch refcounts at all.
I mean the following: remember our discussion about what is free-cluster. If we add
"dynamic-refcount", or "infligth-write-counter" thing only to count inflight data-writing
(or, as discussed, we should count data-reads as well) operations, than "full reference count" of
the cluster is inflight-write-count + qcow2-metadata-refcount.
But if we add a kind of "dynamic refcount" for any use of host cluster, for example reading of L2 table, than
we duplicate the reference in qcow2-metadata to this L2 table (represented as refcount) by our "dynamic
refcount", and we don't have a concept of "full reference count" as the sum above.. We still should
treat a cluster as free when both "dynamic refcount" and qcow2-metadata-refcount are zero, but their sum
doesn't have a good sense. Not a problem maybe.. But looks like a complication with no benefit.
You’re right, but I don’t think that’s a real problem. Perhaps the sum was
even a false equivalency. There is a difference between the dynamic refcount
and the on-disk refcount: We must wait with discarding until the the dynamic
refcount is 0, and discarding will then drop the on-disk refcount to 0, too. I
think. So I’m not sure whether the sum really means anything.
But if metadata isn’t a problem and that means we don’t have ask these
questions at all, well, that’ll be even better.
==
OK, I think now that you didn't mean qcow2_alloc_clusters(). So, we are saying about only
functions returning an offset to cluster with "guest data", not to any kind of
host cluster. Than what you propose looks like this to me:
- take my v5
- rename qcow2_inflight_writes_dec() to put_cluster_offset()
- call qcow2_inflight_writes_inc() from the three functions you mention
Yes, I think so. Or you take the CoRwLock in those three functions, depending
on which implementation we want.
Yes.
Sorry if we’ve discussed this before and I just forgot, but: What are the
performance implications of either solution? As far as I remember, the
inflight-write-counter solution had the problem of always doing stuff on every
I/O access. You said the impact was small and yes, it is, but it’s still there.
Yes. I think, I can measure it on tmpfs, and if degradation less than 1%, I'd
just ignore it. Still inflight-counter solution is more complicated.
Hmm, note that CoRwLock solution has one non-trivial thing too: moving the lock
to task coroutine. With inflight-counters we don't need such thing.
I haven’t looked at the CoRwLock solution but it I would assume it’s basically
zero cost for common cases, right? I.e. the case where the guest already
serializes discards from other accesses, which I thought is what e.g. Linux
does. (And even if it doesn’t, I would assume that concurrent I/O and discards
are rather rare.)
With CoRwLock any kind of host-cluster-discard becomes blocking. We block the
guest io (remember that we want to protect reads too), wait until all
operations completed, then we do discard, than unfreeze the whole io.. What
about removing persistent dirty bitmap? It becomes such an operation.. Hmm. Can
I imagine another example? Switch-to-snapshot should be in a drained section
anyway.. Rewriting compressed cluster to normal cluster is rare case. Hmm what
else?
We don't have it now, but what about a feature like this:
During block-commit, discard clusters that are already committed to base from
all intermediate images: thus we avoid increasing of disk-usage during commit
job.
But, anyway, we don't have such feature yet.
So, I think that in general lockless solution is better.. But currently we
don't have a significant use-case which would be a show-stopper for CoRwLock
solution. Or at least I can't imagine it.
Ha, or, I have one in mind: mirror. mirror does discards (it has
MIRROR_METHOD_DISCARD). Honestly, I always doubt about that fact. I think that
mirror job should not use discard at all, it should either write or
write-zeroes, nothing else.. But it's my opinion, probably I don't know the
use-case. Anyway, with current implementation of mirror-job write-lock on
discard may reduce performance of mirror job (and of guest as well) in cases
when mirror does discard operations.. Mirror does discard when block_status
return neither ZERO nor DATA. And accordingly to my old mail
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg05136.html , only
iscsi may reasonably report such block status... Hmm. Any thoughts?
That make sense to me. Still, put_cluster_offset() name doesn't make it obvious that it's
only for clusters with "guest data", and we shouldn't call it when work with
metadata clusters.
Yeah, it was meant for symmetry with qcow2_get_host_offset() (i.e. it would be
named “qcow2_put_host_offset()”). Now that there are three functions that
would take a reference, it should get some other name. I don’t know.
qcow2_put_data_cluster_offset()?
Max
--
Best regards,
Vladimir
- [PATCH v4 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/19
- [PATCH v4 1/3] qemu-io: add aio_discard, Vladimir Sementsov-Ogievskiy, 2021/03/19
- [PATCH v4 2/3] iotests: add qcow2-discard-during-rewrite, Vladimir Sementsov-Ogievskiy, 2021/03/19
- [PATCH v4 3/3] block/qcow2: introduce discard_rw_lock: fix discarding host clusters, Vladimir Sementsov-Ogievskiy, 2021/03/19
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/25
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Max Reitz, 2021/03/30
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/30
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Max Reitz, 2021/03/30
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/30
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Max Reitz, 2021/03/30
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock),
Vladimir Sementsov-Ogievskiy <=
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/31