[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 1/4] block: support compressed write at generic layer
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [PATCH v5 1/4] block: support compressed write at generic layer |
Date: |
Tue, 22 Oct 2019 14:28:13 +0000 |
22.10.2019 15:56, Max Reitz wrote:
> On 22.10.19 14:23, Vladimir Sementsov-Ogievskiy wrote:
>> 22.10.2019 14:31, Max Reitz wrote:
>>> On 22.10.19 12:46, Vladimir Sementsov-Ogievskiy wrote:
>>>> 22.10.2019 13:21, Andrey Shinkevich wrote:
>>>>>
>>>>> On 22/10/2019 12:28, Max Reitz wrote:
>>>>>> On 20.10.19 22:37, Andrey Shinkevich wrote:
>>>>>>> To inform the block layer about writing all the data compressed, we
>>>>>>> introduce the 'compress' command line option. Based on that option, the
>>>>>>> written data will be aligned by the cluster size at the generic layer.
>>>>>>>
>>>>>>> Suggested-by: Roman Kagan <address@hidden>
>>>>>>> Suggested-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>>>>> Signed-off-by: Andrey Shinkevich <address@hidden>
>>>>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>>>>> ---
>>>>>>> block.c | 20 +++++++++++++++++++-
>>>>>>> block/io.c | 13 +++++++++----
>>>>>>> block/qcow2.c | 4 ++++
>>>>>>> blockdev.c | 9 ++++++++-
>>>>>>> include/block/block.h | 1 +
>>>>>>> include/block/block_int.h | 2 ++
>>>>>>> qapi/block-core.json | 5 ++++-
>>>>>>> qemu-options.hx | 6 ++++--
>>>>>>> 8 files changed, 51 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> The problem with compression is that there are such tight constraints on
>>>>>> it that it can really only work for very defined use cases. Those
>>>>>> constraints are:
>>>>>>
>>>>>> - Only write whole clusters,
>>>>>> - Clusters can be written to only once.
>>>>>>
>>>>>> The first point is addressed in this patch by setting request_alignment.
>>>>>> But I don’t see how the second one can be addressed. Well, maybe by
>>>>>> allowing it in all drivers that support compression. But if I just look
>>>>>> at qcow2, that isn’t going to be trivial: You need to allocate a
>>>>>> completely new cluster where you write the data (in case it grows), and
>>>>>> thus you leave behind a hole, which kind of defeats the purpose of
>>>>>> compression.
>>>>>>
>>>>>> (For demonstration:
>>>>>>
>>>>>> $ ./qemu-img create -f qcow2 test.qcow2 64M
>>>>>> Formatting 'test.qcow2', fmt=qcow2 size=67108864 cluster_size=65536
>>>>>> lazy_refcounts=off refcount_bits=16
>>>>>> $ x86_64-softmmu/qemu-system-x86_64 \
>>>>>> -blockdev "{'node-name': 'drv0', 'driver': 'qcow2',
>>>>>> 'compress': true,
>>>>>> 'file': {'driver': 'file', 'filename':
>>>>>> 'test.qcow2'}}" \
>>>>>> -monitor stdio
>>>>>> QEMU 4.1.50 monitor - type 'help' for more information
>>>>>> (qemu) qemu-io drv0 "write -P 42 0 64k"
>>>>>> wrote 65536/65536 bytes at offset 0
>>>>>> 64 KiB, 1 ops; 00.02 sec (4.055 MiB/sec and 64.8793 ops/sec)
>>>>>> (qemu) qemu-io drv0 "write -P 23 0 64k"
>>>>>> write failed: Input/output error
>>>>>>
>>>>>> )
>>>>>>
>>>>>> Compression really only works when you fully write all of an image
>>>>>> exactly once; i.e. as the qemu-img convert or as a backup target. For
>>>>>> both cases we already have a compression option. So I’m wondering where
>>>>>> this new option is really useful.
>>>>>>
>>>>>> (You do add a test for stream, but I don’t know whether that’s really a
>>>>>> good example, see my response there.)
>>>>>>
>>>>>> Max
>>>>>>
>>>>>
>>>>> Thank you very much Max for your detailed response.
>>>>>
>>>>> 1) You are right that compression is used with the backup mostly. The
>>>>> option for the compression with backup would be replaced by usage at the
>>>>> block layer, with no duplication. Also, it can be useful for NBD for
>>>>> instance,
>>>>>
>>>>> $ ./qemu-img create -f qcow2 -o size=10G ./image.qcow2
>>>>> $ sudo ./qemu-nbd -c /dev/nbd0 ./image.qcow2
>>>>> $ sudo dd if=/dev/sda1 of=/dev/nbd0 bs=10M count=10
>>>>> 10+0 records in
>>>>> 10+0 records out
>>>>> 104857600 bytes (105 MB) copied, 0,0890581 s, 1,2 GB/s
>>>>> $ sudo ./qemu-nbd -d /dev/nbd0
>>>>> $ du -sh ./image.qcow2
>>>>> 101M ./image.qcow2
>>>>>
>>>>> and with the compression
>>>>>
>>>>> $ ./qemu-img create -f qcow2 -o size=10G ./image.qcow2
>>>>> $ sudo ./qemu-nbd -C -c /dev/nbd0 ./image.qcow2
>>>>> $ sudo dd if=/dev/sda1 of=/dev/nbd0 bs=10M count=10
>>>>> 10+0 records in
>>>>> 10+0 records out
>>>>> 104857600 bytes (105 MB) copied, 0,076046 s, 1,4 GB/s
>>>>> $ sudo ./qemu-nbd -d /dev/nbd0
>>>>> $ du -sh ./image.qcow2
>>>>> 5,3M ./image.qcow2
>>>
>>> That seems wrong to me. Why not use qemu-img convert for this case?
>>>
>>> Attaching an NBD server to a compressed disk has exactly the same
>>> problem as attaching a compressed disk to a VM. It won’t work unless
>>> the client/guest is aware of the limitations.
>>>
>>>>> The idea behind introducing the new 'compress' option is to use that
>>>>> only one instead of many other ones of such a kind.
>>>>>
>>>>> 2) You are right also that "Compression can't overwrite anything..."
>>>>> It can be seen in the commit message
>>>>> b0b6862e5e1a1394e0ab3d5da94ba8b0da8664e2
>>>>>
>>>>> I am not sure if data should be written compressed to the active layer.
>>>>> I made the tests with the idea of bringing examples of usage the
>>>>> 'compress' option because passing an option is a tricky thing in QEMU.
>>>>> But the issue takes place anyway if we want to rewrite to allocated
>>>>> clusters.
>>>>> I would like to investigate the matter and make a patch that resolves
>>>>> that issue.
>>>>> Do you agree with that?
>>>
>>> What seems wrong to me is that this series just adds a generic compress
>>> option without ensuring that it works generically.
>>>
>>> Either (1) it only works in well-defined cases, then either (1A) we have
>>> to ensure that we only allow it then (as we do now, because only
>>> qemu-img convert and the backup job have such an option; and those two
>>> are safe), or (1B) we have to clearly give a big warning for the new
>>> option that it doesn’t work correctly. I don’t know whether such a
>>> warning is even possible with just a generic node-level option.
>>>
>>> Or (2) we make it work in generic cases. Well, that might be possible
>>> for qcow2, but who’s going to make it work for VMDK’s streamOptimized
>>> subformat?
>>>
>>> More on all of that below.
>>>
>>>> Yes, we want this option not to allow compressed writes for guests, but to
>>>> allow
>>>> - stream with compression (used to remove compressed incremental backup, we
>>>> need to merge it to the next incremental)
>>>
>>> Based on the assumption that one shouldn’t attach a compressed disk to a
>>> VM, I don’t see how streaming makes sense then. Well, I suppose
>>> intermediate streaming would work.
>>>
>>>> - backup with compression (we have an optional already, so it works)
>>>> - backup to nbd server with compression: enable compression on nbd server
>>>
>>> The problem is clearly that if a generic client were to connect to the
>>> NBD server, it wouldn’t work. In this case, compression will work only
>>> if the clients understands the limitation.
>>>
>>> (The safe way would be to make the NBD server provide an option that
>>> clients can see so they know this limitation and agree they’ll adhere to
>>> it. It’s also a stupid way.)
>>>
>>>> So instead of adding two options (for stream and for nbd), it seems better
>>>> to
>>>> add only one for generic layer.
>>>
>>> I don’t know. It doesn’t work generically, so I don’t know whether it
>>> should be a generic option.
>>>
>>>> Then, it becomes possible to run guest on image with compress=on. It's a
>>>> side
>>>> effect, but still it should work correctly.
>>>
>>> How so? compress=on only works if every party involved only writes to
>>> any cluster of the image exactly once. That is just not the case for
>>> guests unless they know this limitation, and even I don’t see a use case.
>>>
>>>> I think the simplest thing is to just run normal write, if compressed write
>>>> failed because of reallocation. We should check that on that failure-path
>>>> ENOTSUP is returned and handle it for compress=on option by fallback to
>>>> normal write in generic block/io.c
>>>
>>> It seems wrong to not compress data with compress=on.
>>
>> We already fallback to normal write if can't compress in qcow2.
>
> In a very specific case, namely where the compressed size is larger than
> the uncompressed size. It would be a whole different story to only
> compress the first write to a given cluster but not any following one.
>
>>>> (Note, that in case with stream we rewrite only unallocated clusters)
>>>
>>> Yes, but if the stream job writes the cluster before the guest, the
>>> guest gets an I/O error.
>>
>> I don't think that using compressed writes by guest is really usable thing.
>> In our case with stream there is no guest (just use qemu binary to operate
>> on block layer)
>>
>>>
>>>
>>> By the way, the other thing I wondered was whether this should be a
>>> filter driver instead (if it makes sense at all). Such a filter driver
>>> would at least be sufficiently cumbersome to use that probably only
>>> users who understand the implications would make use of it (1B above).
>>>
>>>
>>> I’m not against adding a generic compress=on option if we ensure that it
>>> actually works generically (2 above). It doesn’t right now, so I don’t
>>> think this is right.
>>>
>>> I’m already not sure whether it’s really possible to support generic
>>> compressed writes in qcow2. I suppose it’d be at least awkward. In
>>> theory it should work, it’s just that we can’t keep track of subcluster
>>> allocations, which in the worst case means that some compressed clusters
>>> take up a whole cluster of space.
>>>
>>> For VMDK... I suppose we could consider a new flag for block drivers
>>> that flags whether a driver supports arbitrary compressed writes or has
>>> the old limitations. compress=on could then refuse to work on any block
>>> driver but the ones that support arbitrary compressed writes.
>>>
>>>
>>> Our current model (1A) is simply to ensure that all compressed writes
>>> can adhere to the limitations. As I’ve said above, extending this to
>>> NBD would mean adding some NBD negotiation so both client and server
>>> agree on this limitation.
>>
>> In this case, I'd prefere simple cmdline option fro qemu-nbd.
>>
>>> That seems kind of reasonable from the qemu
>>> side, but probably very unreasonable from an NBD side. Why would NBD
>>> bother to reproduce qemu’s limitations?
>>>
>>>
>>> So these are the three ways (1A, 1B, 2) I can imagine. But just adding
>>> a generic option that unsuspecting users are not unlikely to use but
>>> that simply doesn’t work generically doesn’t seem right to me.
>>>
>>
>> Firstly, 1A already doesn't work correctly: we have compress option for
>> backup.
>> So, it will not work if backup target is not empty.
>
> I’m not sure whether that qualifies because the user is simply
> responsible to ensure that the target is empty.
>
> Otherwise, you could also argue that backup doesn’t necessarily create a
> real backup because with sync=top it won’t copy unallocated clusters, so
> if the target already has data in such a place, it won’t be overwritten.
>
> Furthermore I’d argue this would hardly be an accident. Both
> drive-backup with mode=existing and qemu-img convert -n are most likely
> not used blindly but only when the situation requires it.
>
> My point is that using compress=on can be an accident because people see
> that option and think “That sounds useful!” without reading the warning.
>
>> So, 1A is impossible, as it's already broken, adding separate options for
>> stream
>> and qemu-nbd is not better than just add one generic option.
>
> I disagree that 1A is broken, and I disagree a generic option would not
> be worse. I still believe a simple generic option is prone to
> accidental misuse.
>
>> I don't like (2) as it means a lot of effort to support actually not needed
>> case.
>
> Well, you say it isn’t needed. I suppose if it were supported, people
> would indeed use it.
>
>> I don't really like filter solution (as at seems too much to add a filter
>> for simple
>> boolean option), but I think, we can live with it.
>
> First, it is not a simple boolean option. This patch brings
> implementation with it that converts writes to compressed writes and
> sets a request alignment. I actually think it’s better to do things
> like that in a dedicated block driver than in the generic block layer code.
>
> Second, we already this basically this for copy-on-read (which is also a
> boolean option for -drive).
>
> (FWIW, I’d prefer if detect-zeroes were a block driver instead of
> generic block layer code.)
So, if we want all three options, it should be three filter drivers? Hmm..
>
>> 1B is OK for me, that is, just document what the option does in fact.
>>
>> May be, name the option "compress-new-writes" instead of just "compress"?
>> And document, that writes to clusters, which was not written before will be
>> compressed?
>>
>> Or make the option to be compress=<x>, where x may be 'no' or 'new-writes',
>> reserving 'all-writes' for future?
>
> I don’t know whether the limitations can be captured in three words, but
> it would at least make users pay closer attention to what the option is
> really doing.
>
> OTOH, the only downsides to a dedicated block driver to me appear that
> it’s more cumbersome to use (which I actually consider a benefit) and
> that it requires the usual filter driver boilerplate – but that can be
> copied from other drivers[1].
>
> (The benefits are that we don’t need to modify generic code and that
> people need to read into the filter and its caveats before they can use it.)
>
> Max
>
>
> [1] Actually, we could provide default implementations somewhere and
> make block filters use them. That should reduce the boilerplate size.
>
OK, let's continue with a filter driver.
--
Best regards,
Vladimir
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, (continued)
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Vladimir Sementsov-Ogievskiy, 2019/10/22
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Max Reitz, 2019/10/22
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Vladimir Sementsov-Ogievskiy, 2019/10/22
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Max Reitz, 2019/10/22
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Andrey Shinkevich, 2019/10/22
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Max Reitz, 2019/10/24
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Andrey Shinkevich, 2019/10/24
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Max Reitz, 2019/10/24
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Andrey Shinkevich, 2019/10/24
- Re: [PATCH v5 1/4] block: support compressed write at generic layer, Andrey Shinkevich, 2019/10/24
- Re: [PATCH v5 1/4] block: support compressed write at generic layer,
Vladimir Sementsov-Ogievskiy <=
[PATCH v5 3/4] tests/qemu-iotests: add case to write compressed data of multiple clusters, Andrey Shinkevich, 2019/10/20
[PATCH v5 2/4] qcow2: Allow writing compressed data of multiple clusters, Andrey Shinkevich, 2019/10/20
[PATCH v5 4/4] tests/qemu-iotests: add case for block-stream compress, Andrey Shinkevich, 2019/10/20