qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Latest Qemu-COLO Problems


From: Zhang, Chen
Subject: Re: [Qemu-discuss] Latest Qemu-COLO Problems
Date: Tue, 5 Mar 2019 15:31:39 +0000

From: wenzt [mailto:address@hidden
Sent: Thursday, February 28, 2019 10:00 AM
To: Zhang, Chen <address@hidden>
Cc: 'qemu-discuss' <address@hidden>
Subject: 答复: Latest Qemu-COLO Problems

This version: https://github.com/coloft/qemu/tree/colo-v4.1-periodic-mode

This is old version from 3 years ago, please drop it, use qemu upstream codes.

Another question:
What is the relationship between Proxy and Checkpoint ?

When PVM and SVM send different net packet, proxy will send a signal to 
COLO-frame to do a checkpoint.

Do they work together ? I guess we should set checkpoint interval longer like 
20s.

Yes, they work together, at the same time, we have periodic checkpoint 
mechanism, like a timer. You can set the time too.

Does Proxy only works under network workload ? In my test, I feel like Proxy 
not working.

Yes, as wiki said, colo-proxy compare the PVM and SVM packet to decide if do 
checkpoint.
You can enable the COLO debug info to see proxy’s job in primary node like this:
"{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': 
true} }"


Thanks
Zhang Chen


发件人: Zhang, Chen <address@hidden<mailto:address@hidden>>
发送时间: 2019年2月28日 9:34
收件人: wenzt <address@hidden<mailto:address@hidden>>
抄送: 'qemu-discuss' <address@hidden<mailto:address@hidden>>
主题: RE: Latest Qemu-COLO Problems

Which version?
COLO project always said the PVM and SVM execute in parallel.

Thanks
Zhang Chen

From: wenzt [mailto:address@hidden
Sent: Thursday, February 28, 2019 9:21 AM
To: Zhang, Chen <address@hidden<mailto:address@hidden>>
Cc: 'qemu-discuss' <address@hidden<mailto:address@hidden>>
Subject: 答复: Latest Qemu-COLO Problems

But in earlier version, I noticed that SVM always inmigration status even doing 
checkpoint.
No operation can be performed on SVM.

Thanks,
Zhengtao

发件人: Zhang, Chen <address@hidden<mailto:address@hidden>>
发送时间: 2019年2月27日 18:45
收件人: wenzt <address@hidden<mailto:address@hidden>>
抄送: 'qemu-discuss' <address@hidden<mailto:address@hidden>>
主题: RE: Latest Qemu-COLO Problems


From: wenzt [mailto:address@hidden
Sent: Wednesday, February 27, 2019 6:04 PM
To: Zhang, Chen <address@hidden<mailto:address@hidden>>
Cc: 'qemu-discuss' <address@hidden<mailto:address@hidden>>
Subject: 答复: Latest Qemu-COLO Problems

Thanks for help !

I don’t know why we keep switching SVM between Run and Stop ?
Why we don’t keep SVM inmigration status ?

Because we need do checkpoint to sync all status between PVM and SVM.
We can’t guarantee that their status will be the same after a while.

Thanks
Zhang Chen

Thanks,
Zhengtao

发件人: Zhang, Chen <address@hidden<mailto:address@hidden>>
发送时间: 2019年2月26日 18:41
收件人: wenzt <address@hidden<mailto:address@hidden>>
抄送: 'qemu-discuss' <address@hidden<mailto:address@hidden>>
主题: RE: Latest Qemu-COLO Problems

By the way, please read the COLO wiki use this command to trigger failover in 
secondary node:

{ 'execute': 'nbd-server-stop' }
{ "execute": "x-colo-lost-heartbeat" }


Thanks
Zhang Chen

From: Zhang, Chen
Sent: Tuesday, February 26, 2019 2:46 PM
To: 'wenzt' <address@hidden<mailto:address@hidden>>
Cc: 'qemu-discuss' <address@hidden<mailto:address@hidden>>
Subject: RE: Latest Qemu-COLO Problems

Sorry for slow response.
I have fixed this bug in this series:

https://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg06920.html

Please test it.


Thanks
Zhang Chen

From: wenzt [mailto:address@hidden
Sent: Friday, February 15, 2019 7:54 PM
To: Zhang, Chen <address@hidden<mailto:address@hidden>>
Cc: 'qemu-discuss' <address@hidden<mailto:address@hidden>>
Subject: Latest Qemu-COLO Problems

Hi Zhang,

I have tested COLO with qemu-3.1.0 follow https://wiki.qemu.org/Features/COLO

I got this problems on PVM:
{"timestamp": {"seconds": 1550230616, "microseconds": 644348}, "event": "STOP"}
{"timestamp": {"seconds": 1550230616, "microseconds": 719003}, "event": 
"RESUME"}
{"timestamp": {"seconds": 1550230616, "microseconds": 743554}, "event": "STOP"}
qemu-system-x86_64: Can't receive COLO message: Input/output error
qemu-system-x86_64: Can't receive COLO message: Input/output error
{"timestamp": {"seconds": 1550230618, "microseconds": 257209}, "event": 
"COLO_EXIT", "data": {"mode": "primary", "reason": "error"}}


And on SVM:
{"timestamp": {"seconds": 1550230616, "microseconds": 731544}, "event": "STOP"}
address@hidden:colo_vm_state_change<mailto:address@hidden:colo_vm_state_change> 
Change 'run' => 'stop'
address@hidden:colo_send_message<mailto:address@hidden:colo_send_message> Send 
'checkpoint-reply' message
address@hidden:colo_receive_message<mailto:address@hidden:colo_receive_message> 
Receive 'vmstate-send' message
address@hidden:colo_flush_ram_cache_begin<mailto:address@hidden:colo_flush_ram_cache_begin>
 dirty_pages 18446744073708498780
address@hidden:colo_flush_ram_cache_end<mailto:address@hidden:colo_flush_ram_cache_end>
address@hidden:colo_receive_message<mailto:address@hidden:colo_receive_message> 
Receive 'vmstate-size' message
address@hidden:colo_send_message<mailto:address@hidden:colo_send_message> Send 
'vmstate-received' message
{"timestamp": {"seconds": 1550230616, "microseconds": 837436}, "event": 
"RESUME"}
qemu-system-x86_64: block.c:5062: bdrv_detach_aio_context: Assertion 
`!bs->walking_aio_notifiers' failed.
Aborted (core dumped)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]