qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1] migration: skip sending ram pages released b


From: Jitendra Kolhe
Subject: Re: [Qemu-devel] [PATCH v1] migration: skip sending ram pages released by virtio-balloon driver.
Date: Fri, 11 Mar 2016 15:50:09 +0530
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

On 3/11/2016 12:55 PM, Li, Liang Z wrote:
On 3/10/2016 3:19 PM, Roman Kagan wrote:
On Fri, Mar 04, 2016 at 02:32:47PM +0530, Jitendra Kolhe wrote:
Even though the pages which are returned to the host by
virtio-balloon driver are zero pages, the migration algorithm will
still end up scanning the entire page ram_find_and_save_block() ->
ram_save_page/ ram_save_compressed_page -> save_zero_page() ->
is_zero_range().  We also end-up sending some control information
over network for these page during migration. This adds to total migration
time.

I wonder if it is the scanning for zeros or sending the whiteout which
affects the total migration time more.  If it is the former (as I
would
expect) then a rather local change to is_zero_range() to make use of
the mapping information before scanning would get you all the speedups
without protocol changes, interfering with postcopy etc.

Roman.


Localizing the solution to zero page scan check is a good idea. I too agree that
most of the time is send in scanning for zero page in which case we should be
able to localize solution to is_zero_range().
However in case of ballooned out pages (which can be seen as a subset of
guest zero pages) we also spend a very small portion of total migration time
in sending the control information, which can be also avoided.
  From my tests for 16GB idle guest of which 12GB was ballooned out, the
zero page scan time for 12GB ballooned out pages was ~1789 ms and
save_page_header + qemu_put_byte(f, 0); for same 12GB ballooned out
pages was ~556 ms. Total migration time was ~8000 ms

How did you do the tests? ~ 556ms seems too long for putting several bytes to 
the buffer.
It's likely the time you measured contains the portion to processes the other 
4GB guest memory pages.

Liang


I modified save_zero_page() as below and updated timers only for ballooned out pages so is_zero_page() should return true(also qemu_balloon_bitmap_test() from my patchset returned 1) With below instrumentation, I got t1 = ~1789ms and t2 = ~556ms. Also the total migration time noted (~8000ms) is for unmodified qemu source.
It seems to addup to final migration time with proposed patchset.

Here is the last entry for “another round” of test, this time its ~547ms
JK: block=7f5417a345e0, offset=3ffe42020, zero_page_scan_time=1218 us, save_page_header_time=184 us, total_save_zero_page_time=1453 us cumulated vals: zero_page_scan_time=1723920378 us, save_page_header_time=547514618 us, total_save_zero_page_time=2371059239 us

static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
                          uint8_t *p, uint64_t *bytes_transferred)
{
    int pages = -1;
    int64_t time1, time2, time3, time4;
    static int64_t t1 = 0, t2 = 0, t3 = 0;

    time1 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
    if (is_zero_range(p, TARGET_PAGE_SIZE)) {
        time2 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
        acct_info.dup_pages++;
        *bytes_transferred += save_page_header(f, block,
offset | RAM_SAVE_FLAG_COMPRESS);
        qemu_put_byte(f, 0);
        time3 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
        *bytes_transferred += 1;
        pages = 1;
    }
    time4 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);

    if (qemu_balloon_bitmap_test(block, offset) == 1) {
        t1 += (time2-time1);
        t2 += (time3-time2);
        t3 += (time4-time1);
fprintf(stderr, "block=%lx, offset=%lx, zero_page_scan_time=%ld us, save_page_header_time=%ld us, total_save_zero_page_time=%ld us\n" "cumulated vals: zero_page_scan_time=%ld us, save_page_header_time=%ld us, total_save_zero_page_time=%ld us\n",
                         (unsigned long)block, (unsigned long)offset,
(time2-time1), (time3-time2), (time4-time1), t1, t2, t3);
    }
    return pages;
}

Thanks,
- Jitendra

      if (is_zero_range(p, TARGET_PAGE_SIZE)) {
          acct_info.dup_pages++;
          *bytes_transferred += save_page_header(f, block,
                                                 offset | 
RAM_SAVE_FLAG_COMPRESS);
          qemu_put_byte(f, 0);
          *bytes_transferred += 1;
          pages = 1;
      }
Would moving the solution to save_zero_page() be good enough?

Thanks,
- Jitendra




reply via email to

[Prev in Thread] Current Thread [Next in Thread]