[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PULL 00/20] Migration 20230420 patches
From: |
Juan Quintela |
Subject: |
Re: [PULL 00/20] Migration 20230420 patches |
Date: |
Sat, 22 Apr 2023 11:21:20 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) |
Richard Henderson <richard.henderson@linaro.org> wrote:
> On 4/20/23 14:17, Juan Quintela wrote:
>> The following changes since commit 2d82c32b2ceaca3dc3da5e36e10976f34bfcb598:
>> Open 8.1 development tree (2023-04-20 10:05:25 +0100)
>> are available in the Git repository at:
>> https://gitlab.com/juan.quintela/qemu.git
>> tags/migration-20230420-pull-request
>> for you to fetch changes up to
>> cdf07846e6fe07a2e20c93eed5902114dc1d3dcf:
>> migration: Pass migrate_caps_check() the old and new caps
>> (2023-04-20 15:10:58 +0200)
>> ----------------------------------------------------------------
>> Migration Pull request
>> This series include everything reviewed for migration:
>> - fix for disk stop/start (eric)
>> - detect filesystem of hostmem (peter)
>> - rename qatomic_mb_read (paolo)
>> - whitespace cleanup (李皆俊)
>> I hope copy and paste work for the name O:-)
>> - atomic_counters series (juan)
>> - two first patches of capabilities (juan)
>> Please apply,
>
> Fails CI:
> https://gitlab.com/qemu-project/qemu/-/jobs/4159279870#L2896
>
> /usr/lib/gcc-cross/mipsel-linux-gnu/10/../../../../mipsel-linux-gnu/bin/ld:
> libcommon.fa.p/migration_migration.c.o: undefined reference to symbol
> '__atomic_load_8@@LIBATOMIC_1.0'
Hi Richard
First of all, I have no doubt that you know better that me in this
regard (*).
Once told that, it looks like one case of "my toolchain is better than
yours":
$ ls qemu-system-mips
qemu-system-mips qemu-system-mips64el.p/ qemu-system-mipsel.p/
qemu-system-mips64 qemu-system-mips64.p/ qemu-system-mips.p/
qemu-system-mips64el qemu-system-mipsel
This is Fedora37 with updates.
There are two posibilities here that came to mind, in order of
probability;
- myself with:
- if (ram_counters.dirty_pages_rate && transferred > 10000) {
+ if (qatomic_read__nocheck(&ram_counters.dirty_pages_rate) &&
+ transferred > 10000) {
- paolo:
PostcopyState postcopy_state_get(void)
{
- return qatomic_mb_read(&incoming_postcopy_state);
+ return qatomic_load_acquire(&incoming_postcopy_state);
}
> You're using an atomic 8-byte operation on a host that doesn't support
> it. Did you use qatomic_read__nocheck instead of qatomic_read to try
> and get around a build failure on i686? The check is there for a
> reason...
No, I am changing all ram_counters values to atomic. Almost all of them
move from [u]int64_t to Stat64. Notice that I don't care about 63 to
64bits, and anyways I think it was an error that they were int64_t on
the frist place (blame the old days qapi whet it didn't have unsigned
types).
But it don't exist a stat64_set() function. The most similar thing that
appears here is stat64_init(), but it is cheating about not being atomic
at all.
Almost all ram_counters values are ok with stat64_add() and stat64_get()
operations. But some of them, we need to reset them to zero (or
someother value, but that would not be complicated).
(*) And here is where it comes the call sentence from the 1st paragraph,
see how the stat64_get() gets implemented for the !CONFIG_ATOMIC64, I
didn't even try to write a stat64_set() on my own.
This is one example of the use that I had:
if (qatomic_read__nocheck(&ram_counters.dirty_pages_rate) &&
transferred > 10000) {
- s->expected_downtime = ram_counters.remaining / bandwidth;
+ s->expected_downtime =
+ qatomic_read__nocheck(&ram_counters.dirty_bytes_last_sync) /
+ bandwidth;
}
qemu_file_reset_rate_limit(s->to_dst_file);
diff --git a/migration/ram.c b/migration/ram.c
index 7400abf5e1..7bbaf8cd86 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1224,7 +1224,8 @@ static void migration_bitmap_sync(RAMState *rs)
RAMBLOCK_FOREACH_NOT_IGNORED(block) {
ramblock_sync_dirty_bitmap(rs, block);
}
- ram_counters.remaining = ram_bytes_remaining();
+ qatomic_set__nocheck(&ram_counters.dirty_bytes_last_sync,
+ ram_bytes_remaining());
:
and why I used qatomic_*__nocheck() instead of the proper operations?
Because reading this:
#define qatomic_read__nocheck(ptr) \
__atomic_load_n(ptr, __ATOMIC_RELAXED)
#define qatomic_read(ptr) \
({ \
qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
qatomic_read__nocheck(ptr); \
})
#define qatomic_set__nocheck(ptr, i) \
__atomic_store_n(ptr, i, __ATOMIC_RELAXED)
#define qatomic_set(ptr, i) do { \
qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
qatomic_set__nocheck(ptr, i); \
} while(0)
I was complely sure that we will never get the qemu_build_assert().
I know, I know.
And now that I have explained myself, what is the correct way of doing
this?
I declared the value as:
+ aligned_uint64_t dirty_bytes_last_sync;
- int64_t remaining;
I just want to make sure that *all* ram_counters are atomic and then I
can use them from any thread. All the counters that use stat64 already
are. But for this two to work, I would need to have a way to set and
old value.
And once that we are here, I would like ta have:
stat64_inc(): just add 1, I know, I can create a macro.
and
stat64_reset(): as its name says, it returns the value to zero.
I still miss a couple of stats in migration, where I need to reset them
to zero from time to time:
./ram.c:380: uint64_t bytes_xfer_prev;
./ram.c:747: rs->bytes_xfer_prev = stat64_get(&ram_counters.transferred);
./ram.c:1183: stat64_get(&ram_counters.transferred) -
rs->bytes_xfer_prev;
./ram.c:1247: rs->bytes_xfer_prev =
stat64_get(&ram_counters.transferred);
You can clame that this operation happens always on the migration
thread, but I have found that it is more difficult to document which
ones are atomic and which not, that make all of them atomic. This
variable are get/set once a second, so performance is not one of the
issues.
And:
./ram.c:382: uint64_t num_dirty_pages_period;
./ram.c:746: rs->num_dirty_pages_period = 0;
./ram.c:1095: rs->num_dirty_pages_period += new_dirty_pages;
./ram.c:1133: rs->num_dirty_pages_period * 1000 /
./ram.c:1184: uint64_t bytes_dirty_period = rs->num_dirty_pages_period *
TARGET_PAGE_SIZE;
./ram.c:1232: trace_migration_bitmap_sync_end(rs->num_dirty_pages_period);
./ram.c:1246: rs->num_dirty_pages_period = 0;
The problem here is that we reset the value every second, but for
everything else it is an stat64.
Thanks, Juan.
- [PULL 16/20] util/mmap-alloc: qemu_fd_getfs(), (continued)
- [PULL 16/20] util/mmap-alloc: qemu_fd_getfs(), Juan Quintela, 2023/04/20
- [PULL 17/20] vl.c: Create late backends before migration object, Juan Quintela, 2023/04/20
- [PULL 18/20] migration/postcopy: Detect file system on dest host, Juan Quintela, 2023/04/20
- [PULL 19/20] migration: rename enabled_capabilities to capabilities, Juan Quintela, 2023/04/20
- [PULL 20/20] migration: Pass migrate_caps_check() the old and new caps, Juan Quintela, 2023/04/20
- [PULL 02/20] postcopy-ram: do not use qatomic_mb_read, Juan Quintela, 2023/04/20
- [PULL 14/20] migration: Rename normal to normal_pages, Juan Quintela, 2023/04/20
- [PULL 10/20] migration: Make postcopy_requests atomic, Juan Quintela, 2023/04/20
- [PULL 15/20] migration: Handle block device inactivation failures better, Juan Quintela, 2023/04/20
- Re: [PULL 00/20] Migration 20230420 patches, Richard Henderson, 2023/04/22
- Re: [PULL 00/20] Migration 20230420 patches,
Juan Quintela <=