qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] HEAD is failing virt-test on migration tests


From: Lucas Meneghel Rodrigues
Subject: Re: [Qemu-devel] HEAD is failing virt-test on migration tests
Date: Thu, 12 Feb 2015 22:29:08 -0200

Copying Alex.

OK, after bisecting, this is what I've got:

8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
commit 8118f0950fc77cce7873002a5021172dd6e040b5
Author: Alexander Graf <address@hidden>
Date:   Thu Jan 22 15:01:39 2015 +0100

    migration: Append JSON description of migration stream
    
    One of the annoyances of the current migration format is the fact that
    it's not self-describing. In fact, it's not properly describing at all.
    Some code randomly scattered throughout QEMU elaborates roughly how to
    read and write a stream of bytes.
    
    We discussed an idea during KVM Forum 2013 to add a JSON description of
    the migration protocol itself to the migration stream. This patch
    adds a section after the VM_END migration end marker that contains
    description data on what the device sections of the stream are composed of.
    
    This approach is backwards compatible with any QEMU version reading the
    stream, because QEMU just stops reading after the VM_END marker and ignores
    any data following it.
    
    With an additional external program this allows us to decipher the
    contents of any migration stream and hopefully make migration bugs easier
    to track down.
    
    Signed-off-by: Alexander Graf <address@hidden>
    Signed-off-by: Amit Shah <address@hidden>
    Signed-off-by: Juan Quintela <address@hidden>

:040000 040000 e9a8888ac242a61fbd05bbb0daa3e8877970e738 61df81f831bc86b29f65883523ea95abb36f1ec5 M hw
:040000 040000 fe0659bed17d86c43657c26622d64fd44a1af037 7092a6b6515a3d0077f68ff2d80dbd74597a244f M include
:040000 040000 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1 c2b1dcda197d96657458d699c185e39ae45f3c6c M migration
:100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d 80407662ad3ed860d33a9d35f5c44b1d19c4612b M savevm.c
:040000 040000 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa 7aaf3d10ef7f73413b228e854fe6f04317151e46 M tests

So there you go. I'm going to sleep, if you need any extra help let me know.

Cheers,

Lucas

On Thu, Feb 12, 2015 at 8:56 PM, Lucas Meneghel Rodrigues <address@hidden> wrote:
OK, indeed I can reproduce the problem. It's specific to the filedescriptor migration. An easy way to reproduce it is by doing:

git clone https://github.com/autotest/virt-test.git

cd virt-test
./run -t qemu --bootstrap
./run -t qemu --tests type_specific.io-github-autotest-qemu.migrate.default.fd

That's it. I will see if I can bisect this quickly to pinpoint the QEMU commit that brought the regression.

The qemu master commit I just tested is:

commit 449008f86418583a1f0fb946cf91ee7b4797317d
Merge: 5c697ae bc5baff
Author: Peter Maydell <address@hidden>
Date:   Wed Feb 11 05:14:41 2015 +0000

    Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20150210.0' into staging
    
    RCU fixes and cleanup (Paolo Bonzini)
    Switch to v2 IOMMU interface (Alex Williamson)
    DEBUG build fix (Alexey Kardashevskiy)
    
    # gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID 3BB08B22
    # gpg: Good signature from "Alex Williamson <address@hidden>"
    # gpg:                 aka "Alex Williamson <address@hidden>"
    # gpg:                 aka "Alex Williamson <address@hidden>"
    # gpg:                 aka "Alex Williamson <address@hidden>"
    
    * remotes/awilliam/tags/vfio-update-20150210.0:
      vfio: Fix debug message compile error
      vfio: Use vfio type1 v2 IOMMU interface
      vfio: unmap and free BAR data in instance_finalize
      vfio: free dynamically-allocated data in instance_finalize
      vfio: cleanup vfio_get_device error path, remove vfio_populate_device callback
      memory: unregister AddressSpace MemoryListener within BQL
    
    Signed-off-by: Peter Maydell <address@hidden>


On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues <address@hidden> wrote:
>From what the log says, after a round of migrations 'info migrate' does not respond after 4 minutes, timing out. Virt Test then shuts down the VM. When it tries to check the qcow2 image, it is corrupted. I'm checking out the latest master to see how reproducible this problem is.

On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela <address@hidden> wrote:

Hi

while testing my changes I noticed that virt-test was failing.  I
check-out master, and failures are there.

This is one extract of the log after the 1st failure.  Notice that it
fails randomly, not every time.

I have to go to bed right now, so if anybody beats me with a fix, I
would be happy when I wakeup.

Thanks, Juan.


22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate'
22:54:07 DEBUG| (monitor hmp1)    capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off
22:54:07 DEBUG| (monitor hmp1)    Migration status: active
22:54:07 DEBUG| (monitor hmp1)    total time: 2003 milliseconds
22:54:07 DEBUG| (monitor hmp1)    expected downtime: 300 milliseconds
22:54:07 DEBUG| (monitor hmp1)    setup: 3 milliseconds
22:54:07 DEBUG| (monitor hmp1)    transferred ram: 67619 kbytes
22:54:07 DEBUG| (monitor hmp1)    throughput: 268.61 mbps
22:54:07 DEBUG| (monitor hmp1)    remaining ram: 103056 kbytes
22:54:07 DEBUG| (monitor hmp1)    total ram: 1065796 kbytes
22:54:07 DEBUG| (monitor hmp1)    duplicate: 224304 pages
22:54:07 DEBUG| (monitor hmp1)    skipped: 0 pages
22:54:07 DEBUG| (monitor hmp1)    normal: 16380 pages
22:54:07 DEBUG| (monitor hmp1)    normal bytes: 65520 kbytes
22:54:07 DEBUG| (monitor hmp1)    dirty sync count: 0
22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs)
22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate'
22:54:09 DEBUG| Send command: info migrate
22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate'
22:54:09 DEBUG| (monitor hmp1)    capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off
22:54:09 DEBUG| (monitor hmp1)    Migration status: active
22:54:09 DEBUG| (monitor hmp1)    total time: 4008 milliseconds
22:54:09 DEBUG| (monitor hmp1)    expected downtime: 300 milliseconds
22:54:09 DEBUG| (monitor hmp1)    setup: 3 milliseconds
22:54:09 DEBUG| (monitor hmp1)    transferred ram: 131397 kbytes
22:54:09 DEBUG| (monitor hmp1)    throughput: 268.57 mbps
22:54:09 DEBUG| (monitor hmp1)    remaining ram: 31392 kbytes
22:54:09 DEBUG| (monitor hmp1)    total ram: 1065796 kbytes
22:54:09 DEBUG| (monitor hmp1)    duplicate: 226311 pages
22:54:09 DEBUG| (monitor hmp1)    skipped: 0 pages
22:54:09 DEBUG| (monitor hmp1)    normal: 32289 pages
22:54:09 DEBUG| (monitor hmp1)    normal bytes: 129156 kbytes
22:54:09 DEBUG| (monitor hmp1)    dirty sync count: 0
22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs)
22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate'
22:54:11 DEBUG| Send command: info migrate
22:54:32 WARNI| virt-tests-vm1 is not alive. Can not query the register status
22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10880)
22:58:11 DEBUG| Ending VM virt-tests-vm1 process (monitor)
22:58:11 INFO | [qemu output] (Process terminated with status 0)
22:58:11 DEBUG| VM virt-tests-vm1 down (monitor)
22:58:11 DEBUG| Host does not support OpenVSwitch: Missing command: ovs-vswitchd
22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10763)
22:58:11 DEBUG| Shutting down VM virt-tests-vm1 (shell)
22:58:11 DEBUG| Login command: 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PreferredAuthentications=password -p 5000 address@hidden'
22:58:11 DEBUG| virt-tests-vm1 alive now. Used to failed to get register info from guest 9 times
22:58:13 INFO | [qemu output] (Process terminated with status 0)
22:58:13 DEBUG| VM virt-tests-vm1 down (shell)
22:58:14 DEBUG| Host does not support OpenVSwitch: Missing command: ovs-vswitchd
22:58:14 DEBUG| Checking image file /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2
22:58:14 DEBUG| Running '/bin/qemu-img info /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
22:58:14 DEBUG| Running '/bin/qemu-img check /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
22:58:14 ERROR| [stdout]
22:58:14 ERROR| [stdout] 1 errors were found on the image.
22:58:14 ERROR| [stdout] Data may be corrupted, or further writes to the image may corrupt it.
22:58:14 ERROR| [stdout] 13495/163840 = 8.24% allocated, 0.03% fragmented, 0.00% compressed clusters
22:58:14 ERROR| [stdout] Image end offset: 885129216
22:58:14 ERROR| [stderr] ERROR cluster 13505 refcount=1 reference=2
22:58:14 ERROR| Errors found on image: '/mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
22:58:14 WARNI| virt-tests-vm1 is not alive. Can not query the register status
22:58:14 DEBUG| Thread quit. Used to failed to get register info from guest 20150212-225320-Mb1E4VV7 for 1 times.




--
Lucas



--
Lucas



--
Lucas

reply via email to

[Prev in Thread] Current Thread [Next in Thread]