Hyman Huang <huangy81@chinatelecom.cn> writes:
在 2023/3/24 22:32, Markus Armbruster 写道:
Hyman Huang <huangy81@chinatelecom.cn> writes:
在 2023/3/24 20:11, Markus Armbruster 写道:
huangy81@chinatelecom.cn writes:
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Introduce migration dirty-limit capability, which can
be turned on before live migration and limit dirty
page rate durty live migration.
Introduce migrate_dirty_limit function to help check
if dirty-limit capability enabled during live migration.
Meanwhile, refactor vcpu_dirty_rate_stat_collect
so that period can be configured instead of hardcoded.
dirty-limit capability is kind of like auto-converge
but using dirty limit instead of traditional cpu-throttle
to throttle guest down. To enable this feature, turn on
the dirty-limit capability before live migration using
migrate-set-capabilities, and set the parameters
"x-vcpu-dirty-limit-period", "vcpu-dirty-limit" suitably
to speed up convergence.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Acked-by: Peter Xu <peterx@redhat.com>
[...]
diff --git a/qapi/migration.json b/qapi/migration.json
index d33cc2d582..b7a92be055 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,8 @@
# will be handled faster. This is a performance feature
and
# should not affect the correctness of postcopy
migration.
# (since 7.1)
+# @dirty-limit: Use dirty-limit to throttle down guest if enabled.
+# (since 8.0)
Feels too terse. What exactly is used and how? It's not the capability
itself (although the text sure sounds like it). I guess it's the thing
you set with command set-vcpu-dirty-limit.
Is that used only when the capability is set?
Yes, live migration set "dirty-limit" value when that capability is set,
the comment changes to "Apply the algorithm of dirty page rate limit to throttle
down guest if capability is set, rather than auto-converge".
Please continue to polish the doc if needed. Thanks.
Let's see whether I understand.
Throttling happens only during migration.
There are two throttling algorithms: "auto-converge" (default) and
"dirty page rate limit".
The latter can be tuned with set-vcpu-dirty-limit.
Correct?
Yes
What happens when migration capability dirty-limit is enabled, but the
user hasn't set a limit with set-vcpu-dirty-limit, or canceled it with
cancel-vcpu-dirty-limit?
dirty-limit capability use the default value if user hasn't set.
What is the default value? I can't find it in the doc comments.
see the following code in commit:
[PATCH v4 08/10] migration: Implement dirty-limit convergence algo
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -438,6 +438,8 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
int64_t cpu_index,
Error **errp)
{
+ MigrationState *ms = migrate_get_current();
+
if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
return;
}
@@ -451,6 +453,15 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
return;
}
+ if (migration_is_running(ms->state) &&
+ (!qemu_thread_is_self(&ms->thread)) &&
+ migrate_dirty_limit() &&
+ dirtylimit_in_service()) {
+ error_setg(errp, "can't cancel dirty page limit while"
+ " migration is running");
+ return;
+ }
We can get here even when migration_is_running() is true. Seems to
contradict your claim "no cancel while migration is in progress". Am I
confused?
Please drop the superfluous parenthesis around !qemu_thread_is_self().
+
dirtylimit_state_lock();
if (has_cpu_index) {
@@ -486,6 +497,8 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
uint64_t dirty_rate,
Error **errp)
{
+ MigrationState *ms = migrate_get_current();
+
if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
error_setg(errp, "dirty page limit feature requires KVM with"
" accelerator property 'dirty-ring-size' set'");
@@ -502,6 +515,15 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
return;
}
+ if (migration_is_running(ms->state) &&
+ (!qemu_thread_is_self(&ms->thread)) &&
+ migrate_dirty_limit() &&
+ dirtylimit_in_service()) {
+ error_setg(errp, "can't cancel dirty page limit while"
+ " migration is running");
Same condition, i.e. we dirty limit change is possible exactly when
cancel is. Correct?
+ return;
+ }
+
dirtylimit_state_lock();
if (!dirtylimit_in_service()) {
Maybe it's just me still not understanding things, but the entire
interface feels overly complicated.
Here's my current mental model of what you're trying to provide.
There are two throttling algorithms: "auto-converge" (default) and
"dirty page rate limit". The user can select one.
The latter works with a user-configurable dirty limit.