qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lost partition tables on ide-hd + ahci drive


From: Mike Maslenkin
Subject: Re: Lost partition tables on ide-hd + ahci drive
Date: Thu, 16 Feb 2023 17:17:17 +0300

Does additional comparison make a sense here: check for LBA == 0 and
then check MBR signature bytes.
Additionally it’s easy to check buffer_is_zero() result or even print
FIS contents under these conditions.
Data looks like a part of guest memory of 64bit Windows.

On Wed, Feb 15, 2023 at 1:53 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
>
> Am 14.02.23 um 19:21 schrieb John Snow:
> > On Thu, Feb 2, 2023 at 7:08 AM Fiona Ebner <f.ebner@proxmox.com> wrote:
> >>
> >> Hi,
> >> over the years we've got 1-2 dozen reports[0] about suddenly
> >> missing/corrupted MBR/partition tables. The issue seems to be very rare
> >> and there was no success in trying to reproduce it yet. I'm asking here
> >> in the hope that somebody has seen something similar.
> >>
> >> The only commonality seems to be the use of an ide-hd drive with ahci bus.
> >>
> >> It does seem to happen with both Linux and Windows guests (one of the
> >> reports even mentions FreeBSD) and backing storages for the VMs include
> >> ZFS, RBD, LVM-Thin as well as file-based storages.
> >>
> >> Relevant part of an example configuration:
> >>
> >>>   -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
> >>>   -drive 
> >>> 'file=/dev/zvol/myzpool/vm-168-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on'
> >>>  \
> >>>   -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
> >>
> >> The first reports are from before io_uring was used and there are also
> >> reports with writeback cache mode and discard=on,detect-zeroes=unmap.
> >>
> >> Some reports say that the issue occurred under high IO load.
> >>
> >> Many reports suspect backups causing the issue. Our backup mechanism
> >> uses backup_job_create() for each drive and runs the jobs sequentially.
> >> It uses a custom block driver as the backup target which just forwards
> >> the writes to the actual target which can be a file or our backup server.
> >> (If you really want to see the details, apply the patches in [1] and see
> >> pve-backup.c and block/backup-dump.c).
> >>
> >> Of course, the backup job will read sector 0 of the source disk, but I
> >> really can't see where a stray write would happen, why the issue would
> >> trigger so rarely or why seemingly only ide-hd+ahci would be affected.
> >>
> >> So again, just asking if somebody has seen something similar or has a
> >> hunch of what the cause might be.
> >>
> >
> > Hi Floria;
> >
> > I'm sorry to say that I haven't worked on the block devices (or
> > backup) for a little while now, so I am not immediately sure what
> > might be causing this problem. In general, I advise against using AHCI
> > in production as better performance (and dev support) can be achieved
> > through virtio.
>
> Yes, we also recommend using virtio-{scsi,blk}-pci to our users and most
> do. Still, some use AHCI, I'd guess mostly for Windows, but not only.
>
> > Still, I am not sure why the combination of AHCI with
> > backup_job_create() would be corrupting the early sectors of the disk.
>
> It's not clear that backup itself is causing the issue. Some of the
> reports do correlate it with backup, but there are no precise timestamps
> when the corruption happened. It might be that the additional IO during
> backup is somehow triggering the issue.
>
> > Do you have any analysis on how much data gets corrupted? Is it the
> > first sector only, the first few? Has anyone taken a peek at the
> > backing storage to see if there are any interesting patterns that can
> > be observed? (Zeroes, garbage, old data?)
>
> It does seem to be the first sector only, but it's not entirely clear.
> Many of the affected users said that after fixing the partition table
> with TestDisk, the VMs booted/worked normally again. We only have dumps
> for the first MiB of three images. In this case, all Windows with Ceph
> RBD images.
>
> See below[0] for the dumps. One was a valid MBR and matched the latest
> good backup, so that VM didn't boot for some other reason, not sure if
> even related to this bug. I did not include this one. One was completely
> empty and one contained other data in the first 512 Bytes, then again
> zeroes, but those zeroes are nothing special AFAIK.
>
> > Have any errors or warnings been observed in either the guest or the
> > host that might offer some clues?
>
> There is a single user who seemed to have hardware issues, and I'd be
> inclined to blame those in that case. But none of the other users
> reported any errors or warnings, though I can't say if any checked
> inside the guests.
>
> > Is there any commonality in the storage format being used? Is it
> > qcow2? Is it network-backed?
>
> There are reports with local ZFS volumes, local LVM-Thin volumes, RBD
> images, qcow2 on NFS. So no pattern to be seen.
>
> > Apologies for the "tier 1" questions.
>
> Thank you for your time!
>
> Best Regards,
> Fiona
>
> @Aaron (had access to the broken images): please correct me/add anything
> relevant I missed. Are the broken VMs/backups still present? If yes, can
> we ask the user to check the logs inside?
>
> [0]:
> > febner@enia ~/Downloads % hexdump -C dump-vm-120.raw
> > 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> > |................|
> > *
> > 00100000
> > febner@enia ~/Downloads % hexdump -C dump-vm-130.raw
> > 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> > |................|
> > *
> > 000000c0  00 00 19 03 46 4d 66 6e  00 00 00 00 00 00 00 00  
> > |....FMfn........|
> > 000000d0  04 f2 7a 01 00 00 00 00  00 00 00 00 00 00 00 00  
> > |..z.............|
> > 000000e0  f0 a4 01 00 00 00 00 00  c8 4d 5b 99 0c 81 ff ff  
> > |.........M[.....|
> > 000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> > |................|
> > 00000100  00 42 e1 38 0d da ff ff  00 bc b4 3b 0d da ff ff  
> > |.B.8.......;....|
> > 00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> > |................|
> > 00000120  78 00 00 00 01 00 00 00  a8 00 aa 00 00 00 00 00  
> > |x...............|
> > 00000130  a0 71 ba b0 0c 81 ff ff  2e 00 2e 00 00 00 00 00  
> > |.q..............|
> > 00000140  a0 71 ba b0 0c 81 ff ff  00 00 00 00 00 00 00 00  
> > |.q..............|
> > 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> > |................|
> > *
> > 000001a0  5c 00 44 00 65 00 76 00  69 00 63 00 65 00 5c 00  
> > |\.D.e.v.i.c.e.\.|
> > 000001b0  48 00 61 00 72 00 64 00  64 00 69 00 73 00 6b 00  
> > |H.a.r.d.d.i.s.k.|
> > 000001c0  56 00 6f 00 6c 00 75 00  6d 00 65 00 32 00 5c 00  
> > |V.o.l.u.m.e.2.\.|
> > 000001d0  57 00 69 00 6e 00 64 00  6f 00 77 00 73 00 5c 00  
> > |W.i.n.d.o.w.s.\.|
> > 000001e0  4d 00 69 00 63 00 72 00  6f 00 73 00 6f 00 66 00  
> > |M.i.c.r.o.s.o.f.|
> > 000001f0  74 00 2e 00 4e 00 45 00  54 00 5c 00 46 00 72 00  
> > |t...N.E.T.\.F.r.|
> > 00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> > |................|
> > *
> > 00100000
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]