qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping


From: Hailiang Zhang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Wed, 2 Mar 2016 21:01:58 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
* Hailiang Zhang (address@hidden) wrote:
On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
* Hailiang Zhang (address@hidden) wrote:
On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
* Dr. David Alan Gilbert (address@hidden) wrote:
* zhanghailiang (address@hidden) wrote:
From: root <address@hidden>

This is the 15th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode

There are little changes for this series except the network releated part.

I was looking at the time the guest is paused during COLO and
was surprised to find one of the larger chunks was the time to reset
the guest before loading each checkpoint;  I've traced it part way, the
biggest contributors for my test VM seem to be:

   3.8ms  pcibus_reset: VGA
   1.8ms  pcibus_reset: virtio-net-pci
   1.5ms  pcibus_reset: virtio-blk-pci
   1.5ms  qemu_devices_reset: piix4_reset
   1.1ms  pcibus_reset: piix3-ide
   1.1ms  pcibus_reset: virtio-rng-pci

I've not looked deeper yet, but some of these are very silly;
I'm running with -nographic so why it's taking 3.8ms to reset VGA is
going to be interesting.
Also, my only block device is the virtio-blk, so while I understand the
standard PC machine has the IDE controller, why it takes it over a ms
to reset an unused device.

OK, so I've dug a bit deeper, and it appears that it's the changes in
PCI bars that actually take the time;  every time we do a reset we
reset all the BARs, this causes it to do a pci_update_mappings and
end up doing a memory_region_del_subregion.
Then we load the config space of the PCI device as we do the vmstate_load,
and this recreates all the mappings again.

I'm not sure what the fix is, but that sounds like it would
speed up the checkpoints usefully if we can avoid the map/remap when
they're the same.


Interesting, and thanks for your report.

We already known qemu_system_reset() is a time-consuming function, we shouldn't
call it here, but if we didn't do that, there will be a bug, which we have
reported before in the previous COLO series, the bellow is the copy of the 
related
patch comment:

Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
(in particular if the guest changed a bar and the device code tried to access 
the memory
while loading the state???) - but it does seem to work and shaves ~10ms off the 
reset/load
times:


Nice work, i also tested it, and it is a good improvement, I'm wondering if it 
is safe here,
it should be safe to apply to qemu_system_reset() independently (I tested it 
too,
it will shaves about 5ms off).

Hailiang

Dave

commit 7570b2984143860005ad9fe79f5394c75f294328
Author: Dr. David Alan Gilbert <address@hidden>
Date:   Tue Mar 1 12:08:14 2016 +0000

     COLO: Lock memory map around reset/load

     Changing the memory map appears to be expensive; we see this
     partiuclarly when on loading a checkpoint we:
        a) reset the devices
           This causes PCI bars to be reset
        b) Loading the device states
           This causes the PCI bars to be reloaded.

     Turning this all into a single memory_region_transaction saves
      ~10ms/checkpoint.

     TBD: What happens if the device code accesses the RAM during loading
     the checkpoint?

     Signed-off-by: Dr. David Alan Gilbert <address@hidden>
     Suggested-by: Paolo Bonzini <address@hidden>

diff --git a/migration/colo.c b/migration/colo.c
index 45c3432..c44fb2a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -22,6 +22,7 @@
  #include "net/colo-proxy.h"
  #include "net/net.h"
  #include "block/block_int.h"
+#include "exec/memory.h"

  static bool vmstate_loading;

@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)

          stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
          qemu_mutex_lock_iothread();
+        memory_region_transaction_begin();
          qemu_system_reset(VMRESET_SILENT);
          stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
          timed_average_account(&mis->colo_state.time_reset,
@@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
                            stage_time_end - stage_time_start);
          stage_time_start = stage_time_end;
          ret = qemu_load_device_state(fb);
+        memory_region_transaction_commit();
          if (ret < 0) {
              error_report("COLO: load device state failed\n");
              vmstate_loading = false;

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]