qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-discuss] block device write caching, notifications, and QCOW2 issu


From: Christian Böhme
Subject: [Qemu-discuss] block device write caching, notifications, and QCOW2 issues
Date: Mon, 24 Oct 2016 14:06:13 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.8.0

Hello all,

We are using Qemu as the VMM in a KVM/Linux setup with

file=/var/⟨some regular file 
name⟩,if=none,id=drive-virtio,disk0,format=qcow2,cache=none

as arguments to the only  -drive  option in th invocation,
i.e., the VM is constructed with a single block device for
persistent storage.  The guest OS in question is a run-off-the-mill
Ubuntu GNU/Linux.

On the Ubuntu GNU/Linux host, we have

$ cat /proc/mounts | grep /var
/dev/sda5 /var ext4 rw,nodev,relatime,data=ordered 0 0

$ cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

# hdparm -I /dev/sda | grep -i cache
        cache/buffer size  = unknown
           *    Write cache
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT

$ uname -r
3.13.0-86-generic

$ dpkg-query -W -f '${Package}: ${Version}\n' qemu-system-x86
qemu-system-x86: 1.5.0+dfsg-3ubuntu5.4~cloud0

while in the guest, we have

$ cat /proc/mounts | grep 'data=ordered'
/dev/vda1 / ext4 rw,relatime,data=ordered 0 0

$ cat /sys/block/vda/queue/scheduler
none

$ lspci | grep -i -e sata -e scsi
00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device

$ cat /sys/block/vda/device/features
0010101101110000000000000000110000000000000000000000000000000000

$ cat /sys/block/vda/cache_type
write back

$ uname -r
3.13.0-95-generic

So far, so good.

With the above setup, we have seen inconsistencies in the guest's
filesystem more than once, when the VM was restarted after the host
abruptly lost its power.  This, of course, is to be expected with write
caches enabled and when the guest fails to f(data)sync(2) freshly written
data, before the host goes down.

However, we have also seen regular files, whose data, according to the
filesystem, was modified weeks before the outage, but that nevertheless had
garbled contents after the restart.  Such a constellation is rather unexpected,
since it is unlikely that it takes /that/ long for a well exercised journaling
fileystem to commit its changes to persistent storage.  The "new" contents, it
seems, is not completely random, but looks more like the result of a block
address permutation behind the filesystem's back, as it contains fragments
that one may find in other regular files of the same filesystem.  That is,
the filesystem keeps thinking it addresses the same blocks it did all
along for weeks, while the addresses themselves point to different
blocks now.

Has anyone else experienced such a behaviour?  Could the block driver
stacking employed in Qemu be the culprit, or just the Qemu QCOW2 layer?
It looks like there is just a tad bit too much going on when it tries to
map block addresses to regular file offsets, and this widens the window
within which "nothing may happen" on the host.

While reading qemu(1), I also came across the notion of "write notification"
in relation to block device write caching, where setting either  
cache=writethrough
or  cache=directsync  will Qemu have them generated.  Lacking further
documentation on them, I dug through the code (

$ git status
HEAD detached at v1.5.0
nothing to commit, working directory clean

), but the only thing I could discern from this was that  cache=writethrough
or  cache=directsync  forces the Qemu block layer to issue an explicit flush
on the block driver(s) in question immediately after every write request via
bdrv_co_flush().  Since every request that comes in from the guest's virtio_blk
device is already acknowledged via  virtio_notify(), itself via 
virtio_blk_req_complete(),
the qestion remains, what "write notifications" actually are.  Anyone?


Cheers,
Christian

-- 

Developer Systemintegration

CLOUD&HEAT
The Cloud that heats homes worldwide

Firmen- und Rechnungsanschrift:
CLOUD & HEAT Technologies GmbH

Zeitenströmung
Königsbrücker Str. 96 - Halle 15
01099 Dresden, Germany

Lieferanschrift Produktion:
CLOUD & HEAT Technologies GmbH

Zeitenströmung
Königsbrücker Str. 96 - Halle 16A
01099 Dresden, Germany

Tel: +49 351 479 3670-202
Fax: +49 351 479 3670-110

E-Mail: address@hidden <mailto:address@hidden>
Web: https://www.cloudandheat.com

Besuchen Sie uns:
Facebook <https://www.facebook.com/CloudandHeat>
Google+ <https://plus.google.com/+Cloudandheat>
LinkedIn <https://www.linkedin.com/company/cloud-&-heat-technologies-gmbh>
Twitter <https://twitter.com/CLOUDandHEAT>
Xing <https://www.xing.com/companies/cloud%2526heattechnologiesgmbh>
Youtube <https://www.youtube.com/cloudandheat>

Handelsregister: Amtsgericht Dresden
Registernummer: HRB 30549
USt.-Ident.-Nr.: DE281093504
Geschäftsführer: Nicolas Röhrs

Gemeinsam mit uns nachhaltig sein!
Nicht jede E-Mail muss gedruckt werden.

Hinweis: Diese E-Mail und / oder die Anhänge ist / sind vertraulich und 
ausschließlich für den bezeichneten Adressaten bestimmt. Die Weitergabe oder 
Kopieren dieser E-Mail ist strengstens verboten. Wenn Sie diese E-Mail 
irrtümlich erhalten haben, informieren Sie bitte unverzüglich den Absender und 
vernichten Sie die Nachricht und alle Anhänge. Vielen Dank.


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]