qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] NVDIMM live migration broken?


From: Juan Quintela
Subject: Re: [Qemu-devel] NVDIMM live migration broken?
Date: Tue, 27 Jun 2017 20:12:04 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

Juan Quintela <address@hidden> wrote:
> Haozhong Zhang <address@hidden> wrote:
>
> ....
>
> Hi
>
> I am trying to see what is going on.
>
>>> 
>>
>> I managed to reproduce this bug. After bisect between good v2.8.0 and
>> bad edf8bc984, it looks a regression introduced by
>>     6b6712efccd "ram: Split dirty bitmap by RAMBlock"
>> This commit may result in guest crash after migration if any host
>> memory backend is used.
>>
>> Could you test whether the attached draft patch fixes this bug? If yes,
>> I will make a formal patch later.
>>
>> Thanks,
>> Haozhong
>>
>> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
>> index 73d1bea8b6..2ae4ff3965 100644
>> --- a/include/exec/ram_addr.h
>> +++ b/include/exec/ram_addr.h
>> @@ -377,7 +377,9 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
>> *rb,
>>                                                 uint64_t *real_dirty_pages)
>>  {
>>      ram_addr_t addr;
>> +    ram_addr_t offset = rb->offset;
>>      unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
>> +    unsigned long dirty_page = BIT_WORD((start + offset) >> 
>> TARGET_PAGE_BITS);
>>      uint64_t num_dirty = 0;
>>      unsigned long *dest = rb->bmap;
>>  
>
>
> If this is the case, I can't understand how it ever worked :-(
>
> Investigating.

Further investigation, it gets as:
- pc.ram, by default is at slot 0
- so offset == 0
- rest of devices are not ram-lived

So it work well.

Only ram ends using that function, so we don't care.

When we use nvdimm device (don't know if any other), it just gets out of
ramblock 0, and then the offset is important.

# No NVDIMM

(qemu) info ramblock 
              Block Name    PSize              Offset               Used        
      Total
                  pc.ram    4 KiB  0x0000000000000000 0x0000000040000000 
0x0000000040000000
                vga.vram    4 KiB  0x0000000040060000 0x0000000000400000 
0x0000000000400000

# with NVDIMM

(qemu) info ramblock 
              Block Name    PSize              Offset               Used        
      Total
           /objects/mem1    4 KiB  0x0000000000000000 0x0000000040000000 
0x0000000040000000
                  pc.ram    4 KiB  0x0000000040000000 0x0000000040000000 
0x0000000040000000
                vga.vram    4 KiB  0x0000000080060000 0x0000000000400000 
0x0000000000400000


I am still amused/confused/integrated? how we haven't discovered the
problem before.

The patch fixes the problem described on the thread.


Later, Juan.

>
> Later, Juan.
>
>> @@ -386,8 +388,9 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
>> *rb,
>>          int k;
>>          int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
>>          unsigned long * const *src;
>> -        unsigned long idx = (page * BITS_PER_LONG) / 
>> DIRTY_MEMORY_BLOCK_SIZE;
>> -        unsigned long offset = BIT_WORD((page * BITS_PER_LONG) %
>> +        unsigned long idx = (dirty_page * BITS_PER_LONG) /
>> +                            DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long offset = BIT_WORD((dirty_page * BITS_PER_LONG) %
>>                                          DIRTY_MEMORY_BLOCK_SIZE);
>>  
>>          rcu_read_lock();
>> @@ -416,7 +419,7 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
>> *rb,
>>      } else {
>>          for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
>>              if (cpu_physical_memory_test_and_clear_dirty(
>> -                        start + addr,
>> +                        start + addr + offset,
>>                          TARGET_PAGE_SIZE,
>>                          DIRTY_MEMORY_MIGRATION)) {
>>                  *real_dirty_pages += 1;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]