qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 00/20] Migration 20230420 patches


From: Juan Quintela
Subject: Re: [PULL 00/20] Migration 20230420 patches
Date: Sat, 22 Apr 2023 11:21:20 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Richard Henderson <richard.henderson@linaro.org> wrote:
> On 4/20/23 14:17, Juan Quintela wrote:
>> The following changes since commit 2d82c32b2ceaca3dc3da5e36e10976f34bfcb598:
>>    Open 8.1 development tree (2023-04-20 10:05:25 +0100)
>> are available in the Git repository at:
>>    https://gitlab.com/juan.quintela/qemu.git
>> tags/migration-20230420-pull-request
>> for you to fetch changes up to
>> cdf07846e6fe07a2e20c93eed5902114dc1d3dcf:
>>    migration: Pass migrate_caps_check() the old and new caps
>> (2023-04-20 15:10:58 +0200)
>> ----------------------------------------------------------------
>> Migration Pull request
>> This series include everything reviewed for migration:
>> - fix for disk stop/start (eric)
>> - detect filesystem of hostmem (peter)
>> - rename qatomic_mb_read (paolo)
>> - whitespace cleanup (李皆俊)
>>    I hope copy and paste work for the name O:-)
>> - atomic_counters series (juan)
>> - two first patches of capabilities (juan)
>> Please apply,
>
> Fails CI:
> https://gitlab.com/qemu-project/qemu/-/jobs/4159279870#L2896
>
> /usr/lib/gcc-cross/mipsel-linux-gnu/10/../../../../mipsel-linux-gnu/bin/ld:
> libcommon.fa.p/migration_migration.c.o: undefined reference to symbol
> '__atomic_load_8@@LIBATOMIC_1.0'

Hi Richard

First of all, I have no doubt that you know better that me in this
regard (*).

Once told that, it looks like one case of "my toolchain is better than
yours":

$ ls qemu-system-mips
qemu-system-mips        qemu-system-mips64el.p/ qemu-system-mipsel.p/
qemu-system-mips64      qemu-system-mips64.p/   qemu-system-mips.p/
qemu-system-mips64el    qemu-system-mipsel

This is Fedora37 with updates.

There are two posibilities here that came to mind, in order of
probability;
- myself with:

-    if (ram_counters.dirty_pages_rate && transferred > 10000) {
+    if (qatomic_read__nocheck(&ram_counters.dirty_pages_rate) &&
+        transferred > 10000) {

- paolo:

 PostcopyState  postcopy_state_get(void)
 {
-    return qatomic_mb_read(&incoming_postcopy_state);
+    return qatomic_load_acquire(&incoming_postcopy_state);
 }

> You're using an atomic 8-byte operation on a host that doesn't support
> it.  Did you use qatomic_read__nocheck instead of qatomic_read to try
> and get around a build failure on i686?  The check is there for a
> reason...

No, I am changing all ram_counters values to atomic.  Almost all of them
move from [u]int64_t to Stat64.  Notice that I don't care about 63 to
64bits, and anyways I think it was an error that they were int64_t on
the frist place (blame the old days qapi whet it didn't have unsigned
types).

But it don't exist a stat64_set() function.  The most similar thing that
appears here is stat64_init(), but it is cheating about not being atomic
at all.

Almost all ram_counters values are ok with stat64_add() and stat64_get()
operations.  But some of them, we need to reset them to zero (or
someother value, but that would not be complicated).

(*) And here is where it comes the call sentence from the 1st paragraph,
see how the stat64_get() gets implemented for the !CONFIG_ATOMIC64, I
didn't even try to write a stat64_set() on my own.

This is one example of the use that I had:

     if (qatomic_read__nocheck(&ram_counters.dirty_pages_rate) &&
         transferred > 10000) {
-        s->expected_downtime = ram_counters.remaining / bandwidth;
+        s->expected_downtime =
+            qatomic_read__nocheck(&ram_counters.dirty_bytes_last_sync) /
+                                  bandwidth;
     }
 
     qemu_file_reset_rate_limit(s->to_dst_file);
diff --git a/migration/ram.c b/migration/ram.c
index 7400abf5e1..7bbaf8cd86 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1224,7 +1224,8 @@ static void migration_bitmap_sync(RAMState *rs)
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             ramblock_sync_dirty_bitmap(rs, block);
         }
-        ram_counters.remaining = ram_bytes_remaining();
+        qatomic_set__nocheck(&ram_counters.dirty_bytes_last_sync,
+                             ram_bytes_remaining());
:


and why I used qatomic_*__nocheck() instead of the proper operations?
Because reading this:

#define qatomic_read__nocheck(ptr) \
    __atomic_load_n(ptr, __ATOMIC_RELAXED)

#define qatomic_read(ptr)                              \
    ({                                                 \
    qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
    qatomic_read__nocheck(ptr);                        \
    })

#define qatomic_set__nocheck(ptr, i) \
    __atomic_store_n(ptr, i, __ATOMIC_RELAXED)

#define qatomic_set(ptr, i)  do {                      \
    qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
    qatomic_set__nocheck(ptr, i);                      \
} while(0)

I was complely sure that we will never get the qemu_build_assert().

I know, I know.

And now that I have explained myself, what is the correct way of doing
this?

I declared the value as:

+    aligned_uint64_t dirty_bytes_last_sync;
-    int64_t remaining;

I just want to make sure that *all* ram_counters are atomic and then I
can use them from any thread.  All the counters that use stat64 already
are.  But for this two to work, I would need to have a way to set and
old value.

And once that we are here, I would like ta have:

stat64_inc(): just add 1, I know, I can create a macro.

and

stat64_reset(): as its name says, it returns the value to zero.

I still miss a couple of stats in migration, where I need to reset them
to zero from time to time:

./ram.c:380:    uint64_t bytes_xfer_prev;
./ram.c:747:    rs->bytes_xfer_prev = stat64_get(&ram_counters.transferred);
./ram.c:1183:        stat64_get(&ram_counters.transferred) - 
rs->bytes_xfer_prev;
./ram.c:1247:        rs->bytes_xfer_prev = 
stat64_get(&ram_counters.transferred);

You can clame that this operation happens always on the migration
thread, but I have found that it is more difficult to document which
ones are atomic and which not, that make all of them atomic.  This
variable are get/set once a second, so performance is not one of the
issues.

And:

./ram.c:382:    uint64_t num_dirty_pages_period;
./ram.c:746:    rs->num_dirty_pages_period = 0;
./ram.c:1095:    rs->num_dirty_pages_period += new_dirty_pages;
./ram.c:1133:                         rs->num_dirty_pages_period * 1000 /
./ram.c:1184:    uint64_t bytes_dirty_period = rs->num_dirty_pages_period * 
TARGET_PAGE_SIZE;
./ram.c:1232:    trace_migration_bitmap_sync_end(rs->num_dirty_pages_period);
./ram.c:1246:        rs->num_dirty_pages_period = 0;

The problem here is that we reset the value every second, but for
everything else it is an stat64.

Thanks, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]