Re: [Qemu-discuss] Disk Corruption

qemu-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Disk Corruption

From:	Jakob Bohm
Subject:	Re: [Qemu-discuss] Disk Corruption
Date:	Wed, 1 Jun 2016 20:29:30 +0200
User-agent:	Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0

On 01/06/2016 20:13, Jacob Godin wrote:

Thanks for the tips Jakob. Please see below for details.

    Please clarify a few things for the other people on this list (I
    don't have a solution for your issue, but would like it to be solved
    just to improve the reliability of my own qcow2 disks):

    On 01/06/2016 17:47, Jacob Godin wrote:

        Hi all,

        Been running into an issue with qcow2 disk corruption, hoping we
        can get pointed in the right direction. We're currently using
        latest qemu from Trusty.

    Is this Ubuntu?

    What is the numeric Ubuntu version?

    What is the actual qemu package versions you use ("latest" isn't
    exactly precise)?


Yes, Ubuntu 14.04. qemu-img version 2.0.0 (2.0.0+dfsg-2ubuntu1.22)


qemu-img is just the command line tools such as qemu-img, qemu itself
is in a different package.





        The issue started after powering a VM off and on again. One
        first boot, the guest (CentOS 6) started reporting I/O issues
        almost immediately and then crashed. Following that, the VM was
        unable to read the disk (kept looping through BIOS boot process).


    How did you "power off" the VM?


Using virsh shutdown


    Did you use some qemu management tool (which one and which version)?


libvirt version 1.2.2 (1.2.2-0ubuntu13.1.17)


    Did you kill the qemu process?


We made sure it was dead before taking the snap.


Not clear: Did you *kill* the qemu process or did it exit all by itself
when you shut down the guest?

And the same question back when you made the snapshot.



    Did you do a "clean" shutdown of the Guest OS and wait for the Guest
    OS to tell the qemu process to exit on its own?


Yes, virsh shutdown issues a safe shutdown via ACPI


Other people on this list may know more about what that libvirt version
does in this situation (beside the initial "polite" request via a qemu
command to generate the ACPI event).



    (Note: The latter should not be a requirement for the qcow2
    meta-data to survive, only for the disk image inside to be an image
    of a clean or unclean disk, however it may matter as to how the bug
    was triggered).


        The disk has a single snapshot, which we were able to get
        working by following this process:

          * Attempt to apply snap. Supposedly fails.

    When you "attempted to apply the snapshot", which tool (and version)
    did you use?


Same as above, qemu-img 2.0.0


Ok, so not libvirt's snapshot management commands then.



          * Run qemu-img check + repair
          * Use qemu-img convert to convert qcow2 to qcow2

        Once complete, we were able to boot from the disk, however it
        was at the point that the snapshot was taken. We have attempted
        to do a check+repair and then convert without applying the
        snapshot, but are running into the following errors:

          * qemu-img check + repair:

                Warning: cluster offset=0x2d3120706a0000 is after the end of
                the image file, can't properly check refcounts.
                ERROR offset=2d312070696e00: Cluster is not properly
        aligned;
                L2 entry corrupted.
                Warning: cluster offset=0x2d310a43500000 is after the end of
                the image file, can't properly check refcounts.
                Warning: cluster offset=0x2d310a43510000 is after the end of
                the image file, can't properly check refcounts.
                ERROR offset=2d310a43505500: Cluster is not properly
        aligned;
                L2 entry corrupted.
                Warning: cluster offset=0x20496e74650000 is after the end of
                the image file, can't properly check refcounts.
                Warning: cluster offset=0x20496e74660000 is after the end of
                the image file, can't properly check refcounts.
                ERROR offset=20496e74656c00: Cluster is not properly
        aligned;
                L2 entry corrupted.
                Warning: cluster offset=0x2f6d6d6f6e0000 is after the end of
                the image file, can't properly check refcounts.
                Warning: cluster offset=0xd2070726f0000 is after the end of
                the image file, can't properly check refcounts.
                Warning: cluster offset=0xd207072700000 is after the end of
                the image file, can't properly check refcounts.
                Warning: cluster offset=0x336f7220730000 is after the end of
                the image file, can't properly check refcounts.

          * qemu-img convert:

                qemu-img: error while reading block status of sector 147456:
                Input/output error

        Here's qemu-img from that disk:
        image: disk.pre-convert
        file format: qcow2
        virtual size: 180G (193273528320 bytes)
        disk size: 153G
        cluster_size: 65536
        backing file: /var/lib/nova/instances/_base/xxx
        Snapshot list:
        ID        TAG                 VM SIZE                DATE
         VM CLOCK
        67        xxx  0 2016-04-14 05:22:34   00:00:00.000

        Note that the virtual size has been increased from 80G. It
        previously looked like this:
        image: disk.pre-convert
        file format: qcow2
        virtual size: 80G (85899345920 bytes)
        disk size: 153G
        cluster_size: 65536
        backing file:
        /var/lib/nova/instances/_base/c45e2e81d34824861271a098bccd5585128e2c05
        Snapshot list:
        ID        TAG                 VM SIZE                DATE
         VM CLOCK
        67        e50825fbd43e455283ef847b12eaea4c      0 2016-04-14
        05:22:34   00:00:00.000


        We've tried using qcow2.py from src to clear the snapshot
        headers, however it didn't help.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-discuss] Disk Corruption, Jacob Godin, 2016/06/01
- Re: [Qemu-discuss] Disk Corruption, Jakob Bohm, 2016/06/01
  - Re: [Qemu-discuss] Disk Corruption, Jacob Godin, 2016/06/01
    - Re: [Qemu-discuss] Disk Corruption, Jakob Bohm <=
    - Re: [Qemu-discuss] Disk Corruption, Dominique Ramaekers, 2016/06/02
    - Re: [Qemu-discuss] Disk Corruption, Jacob Godin, 2016/06/02

Prev by Date: Re: [Qemu-discuss] Disk Corruption
Next by Date: Re: [Qemu-discuss] Disk Corruption
Previous by thread: Re: [Qemu-discuss] Disk Corruption
Next by thread: Re: [Qemu-discuss] Disk Corruption
Index(es):
- Date
- Thread