qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/7] support UFFD write fault processing in ram_save_itera


From: Andrey Gruzdev
Subject: Re: [PATCH v3 3/7] support UFFD write fault processing in ram_save_iterate()
Date: Fri, 20 Nov 2020 19:15:07 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 20.11.2020 18:07, Peter Xu wrote:
On Fri, Nov 20, 2020 at 01:44:53PM +0300, Andrey Gruzdev wrote:
On 19.11.2020 21:25, Peter Xu wrote:
On Thu, Nov 19, 2020 at 03:59:36PM +0300, Andrey Gruzdev via wrote:

[...]

+/**
+ * ram_find_block_by_host_address: find RAM block containing host page
+ *
+ * Returns true if RAM block is found and pss->block/page are
+ * pointing to the given host page, false in case of an error
+ *
+ * @rs: current RAM state
+ * @pss: page-search-status structure
+ */
+static bool ram_find_block_by_host_address(RAMState *rs, PageSearchStatus *pss,
+        hwaddr page_address)
+{
+    bool found = false;
+
+    pss->block = rs->last_seen_block;
+    do {
+        if (page_address >= (hwaddr) pss->block->host &&
+            (page_address + TARGET_PAGE_SIZE) <=
+                    ((hwaddr) pss->block->host + pss->block->used_length)) {
+            pss->page = (unsigned long)
+                    ((page_address - (hwaddr) pss->block->host) >> 
TARGET_PAGE_BITS);
+            found = true;
+            break;
+        }
+
+        pss->block = QLIST_NEXT_RCU(pss->block, next);
+        if (!pss->block) {
+            /* Hit the end of the list */
+            pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
+        }
+    } while (pss->block != rs->last_seen_block);
+
+    rs->last_seen_block = pss->block;
+    /*
+     * Since we are in the same loop with ram_find_and_save_block(),
+     * need to reset pss->complete_round after switching to
+     * other block/page in pss.
+     */
+    pss->complete_round = false;
+
+    return found;
+}

I forgot whether Denis and I have discussed this, but I'll try anyways... do
you think we can avoid touching PageSearchStatus at all?

PageSearchStatus is used to track a single migration iteration for precopy, so
that we scan from the 1st ramblock until the last one.  Then we finish one
iteration.


Yes, my first idea also was to separate normal iteration from write-fault
page source completely and leave pss for normal scan.. But, the other idea
is to keep some locality in respect to last write fault. I mean it seems to
be more optimal to re-start normal scan on the page that is next to faulting
one. In this case we can save and un-protect
the neighborhood faster and prevent many write faults.

Yeah locality sounds reasonable, and you just reminded me the fact that
postcopy has that already I think. :) Just see get_queued_page():

     if (block) {
         /*
          * As soon as we start servicing pages out of order, then we have
          * to kill the bulk stage, since the bulk stage assumes
          * in (migration_bitmap_find_and_reset_dirty) that every page is
          * dirty, that's no longer true.
          */
         rs->ram_bulk_stage = false;

         /*
          * We want the background search to continue from the queued page
          * since the guest is likely to want other pages near to the page
          * it just requested.
          */
         pss->block = block;
         pss->page = offset >> TARGET_PAGE_BITS;

         /*
          * This unqueued page would break the "one round" check, even is
          * really rare.
          */
         pss->complete_round = false;
     }

So as long as we queue the pages onto the src_page_requests queue, it'll take
care of write locality already, iiuc.


Snapshot is really something, imho, that can easily leverage this structure
without touching it - basically we want to do two things:

    - Do the 1st iteration of precopy (when ram_bulk_stage==true), and do that
      only.  We never need the 2nd, 3rd, ... iterations because we're 
snapshoting.

    - Leverage the postcopy queue mechanism so that when some page got written,
      queue that page.  We should have this queue higher priority than the
      precopy scanning mentioned above.

As long as we follow above rules, then after the above 1st round precopy, we're
simply done...  If that works, the whole logic of precopy and PageSearchStatus
does not need to be touched, iiuc.

[...]


It's quite good alternative and I thought about using postcopy page queue,
but this implementation won't consider the locality of writes..

What do you think?

So now I think "Do the 1st iteration of precopy only" idea won't work, but
still please consider whether it's natural to just reuse postcopy's queue
mechanism.  IOW, to see whether we can avoid major of the pss logic changes in
this patch.

Thanks,


Yeah, I think we can re-use the postcopy queue code for faulting pages. I'm worring a little about some additional overhead dealing with urgent request semaphore. Also, the code won't change a lot, something like:

[...]
        /* In case of 'write-tracking' migration we first try
         * to poll UFFD and sse if we have write page fault event */
        poll_fault_page(rs);

        again = true;
        found = get_queued_page(rs, &pss);

        if (!found) {
            /* priority queue empty, so just search for something dirty */
            found = find_dirty_block(rs, &pss, &again);
        }
[...]

--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                virtuzzo.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]