[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Overlapping buffers in I/O requests
From: |
Stefan Hajnoczi |
Subject: |
[Qemu-devel] Overlapping buffers in I/O requests |
Date: |
Thu, 30 Sep 2010 13:00:47 +0100 |
There is a block I/O corner case that I don't fully understand. I'd
appreciate thoughts on the expected behavior.
At one point during a Windows Server 2008 install to an IDE disk the
guest sends a read request with overlapping sglist buffers. It looks
like this:
[0] addr=A len=4k
[1] addr=B len=4k
[2] addr=C len=4k
[3] addr=B len=4k
Buffers 1 and 3 are the same guest memory, their addresses match.
If I understand correctly, IDE will perform each operation in turn and
DMA the result back to the buffers in order. Therefore disk contents
at +12k should be written to address B.
Unfortunately QEMU does not guarantee this today. Sometimes the disk
contents at +4k (buffer 1) are read and other times the disk contents
at +12k (buffer 3) are read.
QEMU can be taken out of the picture and replaced by a simple test
program that calls preadv(2) directly with the same overlapping buffer
pattern. There doesn't appear to be a guarantee that the disk
contents at +12k (buffer 3) will be read instead of +4k (buffer 1).
When the page cache is active preadv(2) produces consistent results.
When the page cache is bypassed (O_DIRECT) preadv(2) produces
consistent results against a physical disk:
a-22904 [001] 3042.186790: block_bio_queue: 8,0 R 2048 + 32 [a]
a-22904 [001] 3042.186807: block_getrq: 8,0 R 2048 + 32 [a]
a-22904 [001] 3042.186812: block_plug: [a]
a-22904 [001] 3042.186816: block_rq_insert: 8,0 R 0 ()
2048 + 32 [a]
a-22904 [001] 3042.186822: block_unplug_io: [a] 1
a-22904 [001] 3042.186829: block_rq_issue: 8,0 R 0 ()
2048 + 32 [a]
pam-foreground--22912 [001] 3042.187066: block_rq_complete: 8,0 R ()
2048 + 32 [0]
Notice that a single 32 sector read is issued on /dev/sda (8,0). This
makes sense under the assumption that the disk honors DMA buffer
ordering within a request.
However, when the page cache is bypassed preadv(2) produces
inconsistent results against a file on ext3 -> LVM -> dm-crypt ->
/dev/sda.
a-22834 [001] 3038.425802: block_bio_queue: 254,3 R
32616672 + 8 [a]
a-22834 [001] 3038.425812: block_remap: 254,0 R
58544736 + 8 <- (254,3) 32616672
a-22834 [001] 3038.425813: block_bio_queue: 254,0 R
58544736 + 8 [a]
kcryptd_io-379 [001] 3038.425832: block_remap: 8,0 R 59044807
+ 8 <- (8,2) 58546792
kcryptd_io-379 [001] 3038.425833: block_bio_queue: 8,0 R
59044807 + 8 [kcryptd_io]
kcryptd_io-379 [001] 3038.425841: block_getrq: 8,0 R 59044807
+ 8 [kcryptd_io]
kcryptd_io-379 [001] 3038.425845: block_plug: [kcryptd_io]
kcryptd_io-379 [001] 3038.425848: block_rq_insert: 8,0 R 0 ()
59044807 + 8 [kcryptd_io]
kcryptd_io-379 [001] 3038.425859: block_rq_issue: 8,0 R 0 ()
59044807 + 8 [kcryptd_io]
a-22834 [001] 3038.425894: block_bio_queue: 254,3 R
32616792 + 16 [a]
a-22834 [001] 3038.425898: block_remap: 254,0 R
58544856 + 16 <- (254,3) 32616792
a-22834 [001] 3038.425899: block_bio_queue: 254,0 R
58544856 + 16 [a]
kcryptd_io-379 [001] 3038.425908: block_remap: 8,0 R 59044927
+ 16 <- (8,2) 58546912
kcryptd_io-379 [001] 3038.425909: block_bio_queue: 8,0 R
59044927 + 16 [kcryptd_io]
kcryptd_io-379 [001] 3038.425911: block_getrq: 8,0 R 59044927
+ 16 [kcryptd_io]
kcryptd_io-379 [001] 3038.425913: block_plug: [kcryptd_io]
kcryptd_io-379 [001] 3038.425914: block_rq_insert: 8,0 R 0 ()
59044927 + 16 [kcryptd_io]
a-22834 [001] 3038.425920: block_bio_queue: 254,3 R
32616992 + 8 [a]
a-22834 [001] 3038.425922: block_remap: 254,0 R
58545056 + 8 <- (254,3) 32616992
a-22834 [001] 3038.425923: block_bio_queue: 254,0 R
58545056 + 8 [a]
a-22834 [001] 3038.425929: block_unplug_io: [a] 0
a-22834 [001] 3038.425930: block_unplug_io: [a] 0
a-22834 [001] 3038.425931: block_unplug_io: [a] 2
a-22834 [001] 3038.425934: block_rq_issue: 8,0 R 0 ()
59044927 + 16 [a]
kcryptd_io-379 [001] 3038.425948: block_remap: 8,0 R 59045127
+ 8 <- (8,2) 58547112
kcryptd_io-379 [001] 3038.425949: block_bio_queue: 8,0 R
59045127 + 8 [kcryptd_io]
kcryptd_io-379 [001] 3038.425951: block_getrq: 8,0 R 59045127
+ 8 [kcryptd_io]
kcryptd_io-379 [001] 3038.425953: block_plug: [kcryptd_io]
kcryptd_io-379 [001] 3038.425954: block_rq_insert: 8,0 R 0 ()
59045127 + 8 [kcryptd_io]
<idle>-0 [001] 3038.427414: block_unplug_timer: [swapper] 3
kblockd/1-21 [001] 3038.427437: block_unplug_io: [kblockd/1] 3
kblockd/1-21 [001] 3038.427440: block_rq_issue: 8,0 R 0 ()
59045127 + 8 [kblockd/1]
<idle>-0 [000] 3038.436786: block_rq_complete: 8,0 R ()
59044807 + 8 [0]
kcryptd-380 [001] 3038.436960: block_bio_complete: 254,0 R
58544736 + 8 [0]
kcryptd-380 [001] 3038.436963: block_bio_complete: 254,3 R
32616672 + 8 [0]
<idle>-0 [001] 3038.437070: block_rq_complete: 8,0 R ()
59044927 + 16 [0]
kcryptd-380 [000] 3038.437343: block_bio_complete: 254,0 R
58544856 + 16 [611733513]
kcryptd-380 [000] 3038.437346: block_bio_complete: 254,3 R
32616792 + 16 [-815025730]
<idle>-0 [000] 3038.437428: block_rq_complete: 8,0 R ()
59045127 + 8 [0]
kcryptd-380 [000] 3038.437569: block_bio_complete: 254,0 R
58545056 + 8 [-2107963545]
kcryptd-380 [000] 3038.437571: block_bio_complete: 254,3 R
32616992 + 8 [176593183]
The 32 sectors are broken up into 8, 8, and 16 sector requests. I
believe the filesystem is doing this before LVM is reached. This
makes sense since a file may not be contiguous on disk and several
extents need to be read.
These 3 independent requests can complete in any order. The order
will affect what contents are visible at address B when the read
completes.
So now my question:
Is QEMU risking data corruption when buffers overlap? If IDE
guarantees that buffers are read in order then we are doing it wrong
(at least when O_DIRECT is used).
Perhaps there is no ordering guarantee in IDE, Windows is doing
something crazy, and QEMU is within its writes to use preadv(2) like
this.
Stefan
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-devel] Overlapping buffers in I/O requests,
Stefan Hajnoczi <=