Re: Which qemu change corresponds to RedHat bug 1655408

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which qemu change corresponds to RedHat bug 1655408

From:	Jakob Bohm
Subject:	Re: Which qemu change corresponds to RedHat bug 1655408
Date:	Sat, 10 Oct 2020 00:54:19 +0200
User-agent:	Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1

On 2020-10-09 15:56, Max Reitz wrote:

On 09.10.20 14:55, Jakob Bohm wrote:

On 2020-10-09 10:48, Max Reitz wrote:


[...]

The error I got was specifically "Failed to lock byte 100" and VM not
starting.  The ISO file was on a R/W NFS3 share, but was itself R/O for
the user that root was mapped to by linux-nfs-server via /etc/exports
options, specifically the file iso file was mode 0444 in a 0755
directory, and the exports line was (simplified)

/share1
xxxx:xxxx:xxxx:xxxx/64(ro,sync,mp,subtree_check,anonuid=1000,anongid=1000)

where xxxx:xxxx:xxxx:xxxx/64 is the numeric IPv6 prefix of the LAN

NFS kernel Server ran Debian Stretch kernel 4.19.0-0.bpo.8-amd64 #1 SMP
Debian 4.19.98-1~bpo9+1 (2020-03-09) x86_64 GNU/Linux

NFS client mount options were:

rw,nosuid,nodev,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,
soft,proto=tcp6,timeo=600,retrans=6,sec=sys,mountaddr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx,

mountvers=3,mountport=45327,mountproto=udp6,local_lock=none,addr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx


NFS client ran Debian Buster kernel 4.19.0-0.bpo.6-amd64 #1 SMP Debian
4.19.67-2+deb10u2~bpo9+1 (2019-11-12) x86_64 with Debian qemu-system-
x86 version 1:5.0-14~bpo10+1  Booting used SysV init and libvirt
was not used.

Copying the ISO to a local drive (where qemu-as-root had full
capabilities to bypass file security) worked around the failure.

I hope these details help reproduce the bug.


I’ll try again, thanks.

Can you perchance reproduce the bug with a more recent upstream kernel
(e.g. 5.8)?  I seem to recall there have been some locking bugs in the
NFS code, perhaps there was something that was fixed by now.

(Or at least 4.19.150, which seems to be the most recent 4.19.x
according to kernel.org)

And I still have no idea why qemu tried to lock bytes in a read-only raw
image file, there is no block metadata to synchronize access to (like in
qcow2), when the option explicitly said ",format=raw" to avoid attempts
to access the iso file as any of the advanced virtual disk formats.


I reasoned about that in my previous reply already, see below.  It’s
because just because an image file is read-only when opening it doesn’t
mean that it’s going to stay that way.

You’re correct that in the case of raw, this isn’t about metadata (as
there isn’t any), but about guest data, which needs to be protected from
concurrent access all the same, though.

(As for “why does qemu try to lock, when the option explicitly said
raw”; there is a dedicated option to turn off locking, and that is
file.locking=off.  I’m not suggesting that as a realistic work-around,
I’m just adding that FYI in case you didn’t know and need something ASAP.)

[...]

The error message itself seams meaningless, as there is no particular
reason to request file locks on a read-only raw disk image.


Yes, there is.  We must prevent a concurrent instance from writing to
the image[1], and so we have to signal that somehow, which we do through
file locks.

I suppose it can be argued that if the image file itself is read-only
(outside of qemu), there is no need for locks, because nothing could
ever modify the image anyway.  But wouldn’t it be possible to change the
modifications after qemu has opened the image, or to remount some RO
filesystem R/W?

Perhaps we could automatically switch off file locks for a given image
file when taking the first one fails, and the image is read-only.  But
first I’d rather know what exactly is causing the error you see to
appear.

[1] Technically, byte 100 is about being able to read valid data from
the image, which is a constraint that’s only very rarely broken.  But
still, it’s a constraint that must be signaled.  (You only see the
failure on this byte, because the later bytes (like the one not
preventing concurrent R/W access, 201) are not even attempted to be
locked after the first lock fails.)

(As for other instances writing to the image, you can allow that by
setting the share-rw=on option on the guest device.  This tells qemu
that the guest will accept modifications from the outside.  But that
still won’t prevent qemu from having to take a shared lock on byte 100.)

Max

Theoretically, locking on a raw file needs to be protocol-compatiblewith loop-mounting the same raw file, so if the loop driver doesn'tprobe those magic byte offsets to prevent out-of-order block writes,

then there is little point for the qemu raw driver to do so.

This applies to both the loop driver in the host kernel and the loop
driver on any other machine with file share access to the sane image
file.

As for upgrading, I will try newer kernels packaged for the Debian

version used, once the current large batch job has completed, but Idoubt it will make much difference given the principles I just stated.




Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Which qemu change corresponds to RedHat bug 1655408, Philippe Mathieu-Daudé, 2020/10/08
- Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm, 2020/10/08
  - Re: Which qemu change corresponds to RedHat bug 1655408, Philippe Mathieu-Daudé, 2020/10/08
  - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz, 2020/10/09
    - Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm, 2020/10/09
    - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz, 2020/10/09
    - Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm <=
    - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz, 2020/10/12
    - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz, 2020/10/12
    - Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm, 2020/10/12

Prev by Date: Re: [PATCH v1 0/4] Allow loading a no MMU kernel
Next by Date: Re: [RFC PATCH 0/4] generic loader FDT support (for direct Xen boot)
Previous by thread: Re: Which qemu change corresponds to RedHat bug 1655408
Next by thread: Re: Which qemu change corresponds to RedHat bug 1655408
Index(es):
- Date
- Thread