Re: [Qemu-block] Change in qemu 2.12 causes qemu-img convert to NBD to w

On Wed, Nov 7, 2018 at 7:55 PM Nir Soffer <address@hidden> wrote:

On Wed, Nov 7, 2018 at 7:27 PM Kevin Wolf <address@hidden> wrote:
Am 07.11.2018 um 15:56 hat Nir Soffer geschrieben:
> Wed, Nov 7, 2018 at 4:36 PM Richard W.M. Jones <address@hidden> wrote:
>
> > Another thing I tried was to change the NBD server (nbdkit) so that it
> > doesn't advertise zero support to the client:
> >
> > $ nbdkit --filter=log --filter=nozero memory size=6G logfile=/tmp/log \
> > --run './qemu-img convert ./fedora-28.img -n $nbd'
> > $ grep '\.\.\.$' /tmp/log | sed 's/.*$[A-Z][a-z]*$.*/\1/' | uniq -c
> > 2154 Write
> >
> > Not surprisingly no zero commands are issued. The size of the write
> > commands is very uneven -- it appears to be send one command per block
> > of zeroes or data.
> >
> > Nir: If we could get information from imageio about whether zeroing is
> > implemented efficiently or not by the backend, we could change
> > virt-v2v / nbdkit to advertise this back to qemu.
>
> There is no way to detect the capability, ioctl(BLKZEROOUT) always
> succeeds, falling back to manual zeroing in the kernel silently
>
> Even if we could, sending zero on the wire from qemu may be even
> slower, and it looks like qemu send even more requests in this case
> (2154 vs ~1300).
>
> Looks like this optimization in qemu side leads to worse performance,
> so it should not be enabled by default.

Well, that's overgeneralising your case a bit. If the backend does
support efficient zero writes (which file systems, the most common case,
generally do), doing one big write_zeroes request at the start can
improve performance quite a bit.

It seems the problem is that we can't really know whether the operation
will be efficient because the backends generally don't tell us. Maybe
NBD could introduce a flag for this, but in the general case it appears
to me that we'll have to have a command line option.

However, I'm curious what your exact use case and the backend used in it
is? Can something be improved there to actually get efficient zero
writes and get even better performance than by just disabling the big
zero write?

The backend is some NetApp storage connected via FC. I don't have
more info on this. We get zero rate of about 1G/s on this storage, which
is quite slow compared with other storage we tested.

One option we check now is if this is the kernel silent fallback to manual
zeroing when the server advertise wrong value of write_same_max_bytes.

We eliminated this using blkdiscard. This is what we get on with this storage

zeroing 100G LV:

for i in 1 2 4 8 16 32; do time blkdiscard -z -p ${i}m /dev/6e1d84f9-f939-46e9-b108-0427a08c280c/2d5c06ce-6536-4b3c-a7b6-13c6d8e55ade; done

real 4m50.851s

user 0m0.065s

sys 0m1.482s

real 4m30.504s

user 0m0.047s

sys 0m0.870s

real 4m19.443s

user 0m0.029s

sys 0m0.508s

real 4m13.016s

user 0m0.020s

sys 0m0.284s

real 2m45.888s

user 0m0.011s

sys 0m0.162s

real 2m10.153s

user 0m0.003s

sys 0m0.100s

We are investigating why we get low throughput on this server, and also will check

several other servers.

Having a command line option to control this behavior sounds good. I don't
have enough data to tell what should be the default, but I think the safe
way would be to keep old behavior.

We file this bug:

https://bugzilla.redhat.com/1648622

Nir

From:	Nir Soffer
Subject:	Re: [Qemu-block] Change in qemu 2.12 causes qemu-img convert to NBD to write more data
Date:	Sun, 11 Nov 2018 18:11:49 +0200