qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: migration: broken snapshot saves appear on s390 when small fields in


From: Claudio Fontana
Subject: Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed
Date: Mon, 13 Jul 2020 13:39:58 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1

Hello,

On 7/13/20 1:03 PM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> Hello all,
>>
>> during unrelated work for splitting QTest from the TCG instruction counting 
>> module,
>>
>> I encountered what seems to be a migration stream issue, which is apparent 
>> only on s390, and only shows in block test 267.
>>
>> ./check -qcow2 267
>>
>> when it comes to snapshot save and load using backing file.
>>
>> Here is a minimal reproducer patch that causes the issue on s390 only.
>>
>> --------------------------------------------cut-------------------------------------------
>> diff --git a/cpus.c b/cpus.c
>> index 41d1c5099f..443b88697a 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -643,7 +643,7 @@ static void qemu_account_warp_timer(void)
>>  
>>  static bool icount_state_needed(void *opaque)
>>  {
>> -    return use_icount;
>> +    return 0;
>>  }
> 
> That's weird; I mean that's just turning a subsection on and off;
> so you'd hope if this is a test that generates it's own snapshot and
> then uses it then it should be consistent.

Indeed.

> 
>>  static bool warp_timer_state_needed(void *opaque)
>> --------------------------------------------cut-------------------------------------------
>>
>> config.status configure line:
>> exec '/home/cfontana/qemu-build/../qemu/configure' '--enable-tcg' 
>> '--disable-kvm' '--disable-hax' '--target-list=s390x-softmmu' 
>> '--enable-debug'
>>
>> $ make check-block
>>
>> TEST    iotest-qcow2: 267 [fail]
>> QEMU          -- 
>> "/home/cfontana/qemu-build/tests/qemu-iotests/../../s390x-softmmu/qemu-system-s390x"
>>  -nodefaults -display none -accel qtest
>> QEMU_IMG      -- 
>> "/home/cfontana/qemu-build/tests/qemu-iotests/../../qemu-img" 
>> QEMU_IO       -- 
>> "/home/cfontana/qemu-build/tests/qemu-iotests/../../qemu-io"  --cache 
>> writeback --aio threads -f qcow2
>> QEMU_NBD      -- 
>> "/home/cfontana/qemu-build/tests/qemu-iotests/../../qemu-nbd" 
>> IMGFMT        -- qcow2 (compat=1.1)
>> IMGPROTO      -- file
>> PLATFORM      -- Linux/s390x s390zp13 5.3.18-21-default
>> TEST_DIR      -- /home/cfontana/qemu-build/tests/qemu-iotests/scratch
>> SOCK_DIR      -- /tmp/tmp.bLJcJVtzk5
>> SOCKET_SCM_HELPER -- 
>> /home/cfontana/qemu-build/tests/qemu-iotests/socket_scm_helper
>>
>> --- /home/cfontana/qemu/tests/qemu-iotests/267.out   2020-07-12 
>> 05:10:07.948262675 -0400
>> +++ /home/cfontana/qemu-build/tests/qemu-iotests/267.out.bad 2020-07-12 
>> 05:27:03.358362781 -0400
>> @@ -137,6 +137,9 @@
>>  ID        TAG                 VM SIZE                DATE       VM CLOCK
>>  --        snap0                  SIZE yyyy-mm-dd hh:mm:ss   00:00:00.000
>>  (qemu) loadvm snap0
>> +Unexpected storage key flag data: 0
>> +error while loading state for instance 0x0 of device 's390-skeys'
>> +Error: Error -22 while loading VM state
>>  (qemu) quit
>>  
>>  
>> -----------
>>
>>
>> Not run: 172 186 192 259 287
>> Failures: 267
>> Failed 1 of 115 iotests
>> make: *** [/home/cfontana/qemu/tests/Makefile.include:880: 
>> check-tests/check-block.sh] Error 1
>>
>> -----------
>>
>> Note: only the === -blockdev with a backing file === part of test 267 fails. 
>> -blockdev with NBD is ok, like all the rest.
>>
>>
>> Interesting facts about s390 in particular: its save/load code includes the 
>> transfer of "storage keys",
>> which include a buffer of 32768 bytes of keydata in the stream.
>>
>> The code (hw/s390x/s390-skeys.c),
>> is modeled similarly to RAM transfer (like in migration/ram.c), with an EOS 
>> (end of stream) marker.
>>
>> Countrary to RAM transfer code though, after qemu_put_be64(f, EOS), the s390 
>> code does not qemu_fflush(f).
>>
>> ----------
>> Observation: the migration/qemu-file.c shows an IO_BUF_SIZE of 32768.
>>
>> --
>>
>> The following workarounds hide the problem (make the test pass):
>>
>> 1) always including the icount field in the (unrelated) timers field that 
>> are sent before in the migration stream (ie not applying the reproducer 
>> patch).
>>
>> 2) increasing the IO_BUF_SIZE also hides the problem:
>>
>> ----------------------cut--------------------------
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index be21518c57..f81d1272eb 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -30,7 +30,7 @@
>>  #include "trace.h"
>>  #include "qapi/error.h"
>>  
>> -#define IO_BUF_SIZE 32768
>> +#define IO_BUF_SIZE 65536
>>  #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
>>  
>>  struct QEMUFile {
>> ----------------------cut--------------------------
>>
>> 3) adding a qemu_fflush in hw/s390x/s390-skeys.c after EOS also "fixes" the 
>> problem:
>>
>> ----------------------cut--------------------------
>> diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
>> index 1e036cc602..47c9a015af 100644
>> --- a/hw/s390x/s390-skeys.c
>> +++ b/hw/s390x/s390-skeys.c
>> @@ -252,6 +252,8 @@ static const TypeInfo qemu_s390_skeys_info = {
>>      .class_size    = sizeof(S390SKeysClass),
>>  };
>>  
>> +extern void qemu_fflush(QEMUFile *f);
>> +
>>  static void s390_storage_keys_save(QEMUFile *f, void *opaque)
>>  {
>>      S390SKeysState *ss = S390_SKEYS(opaque);
>> @@ -302,6 +304,7 @@ static void s390_storage_keys_save(QEMUFile *f, void 
>> *opaque)
>>      g_free(buf);
>>  end_stream:
>>      qemu_put_be64(f, eos);
>> +    qemu_fflush(f);
>>  }
>>  
>>  static int s390_storage_keys_load(QEMUFile *f, void *opaque, int version_id)
>> ----------------------cut--------------------------
>>
>> Do any of you with better understanding of migration/, block and s390 have a 
>> suggestion on what could be the issue here,
>> and what could be the next step in the investigation?
>>
>> Is the fact that migration/ram.c always does fflush after writing the EOS 
>> have any relevance here? why does it do it,
>> and should s390 code also follow the same pattern?
> 
> I didn't think it was required.
> And qemu_put_buffer loops if needed and calls qemu_fflush internally.

This detail probably does not help in this case, because the buffer itself is 
fine (32678 zero bytes is actually correct).

The End Of Stream marker written just after with qemu_put_be64 is not ok,
and anything past that point is just a whole bunch of zeroes in the stream when 
we hookup the calls to

qcow2_co_pwritev_task and qcow2_co_preadv_task

(see my response to Paolo in this thread).


> It's possible here that the storage key code is just the canary - the
> first thing that detects that the stream is invalid after it all goes
> wrong.

Seems the exact opposite to me, ie, the stream seems just fine up to the point 
where the EOS marker after the keydata in "s390-skeys" should be.

Thanks!

Claudio

> 
> Dave
> 
> 
>> Thanks,
>>
>> Claudio
>>
>>
>> -- 
>> Claudio Fontana
>> Engineering Manager Virtualization, SUSE Labs Core
>>
>> SUSE Software Solutions Italy Srl
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]