Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for co

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for co

From:	Daniel P . Berrangé
Subject:	Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression
Date:	Thu, 28 Jun 2018 10:36:50 +0100
User-agent:	Mutt/1.10.0 (2018-05-17)

On Thu, Jun 28, 2018 at 05:12:39PM +0800, Xiao Guangrong wrote:
> 
> Hi Peter,
> 
> Sorry for the delay as i was busy on other things.
> 
> On 06/19/2018 03:30 PM, Peter Xu wrote:
> > On Mon, Jun 04, 2018 at 05:55:14PM +0800, address@hidden wrote:
> > > From: Xiao Guangrong <address@hidden>
> > > 
> > > Detecting zero page is not a light work, we can disable it
> > > for compression that can handle all zero data very well
> > 
> > Is there any number shows how the compression algo performs better
> > than the zero-detect algo?  Asked since AFAIU buffer_is_zero() might
> > be fast, depending on how init_accel() is done in util/bufferiszero.c.
> 
> This is the comparison between zero-detection and compression (the target
> buffer is all zero bit):
> 
> Zero 810 ns Compression: 26905 ns.
> Zero 417 ns Compression: 8022 ns.
> Zero 408 ns Compression: 7189 ns.
> Zero 400 ns Compression: 7255 ns.
> Zero 412 ns Compression: 7016 ns.
> Zero 411 ns Compression: 7035 ns.
> Zero 413 ns Compression: 6994 ns.
> Zero 399 ns Compression: 7024 ns.
> Zero 416 ns Compression: 7053 ns.
> Zero 405 ns Compression: 7041 ns.
> 
> Indeed, zero-detection is faster than compression.
> 
> However during our profiling for the live_migration thread (after reverted 
> this patch),
> we noticed zero-detection cost lots of CPU:
> 
>  12.01%  kqemu  qemu-system-x86_64            [.] buffer_zero_sse2            
>                                                                               
>                                                                               
>    ◆
>   7.60%  kqemu  qemu-system-x86_64            [.] ram_bytes_total             
>                                                                               
>                                                                               
>    ▒
>   6.56%  kqemu  qemu-system-x86_64            [.] qemu_event_set              
>                                                                               
>                                                                               
>    ▒
>   5.61%  kqemu  qemu-system-x86_64            [.] qemu_put_qemu_file          
>                                                                               
>                                                                               
>    ▒
>   5.00%  kqemu  qemu-system-x86_64            [.] __ring_put                  
>                                                                               
>                                                                               
>    ▒
>   4.89%  kqemu  [kernel.kallsyms]             [k] 
> copy_user_enhanced_fast_string                                                
>                                                                               
>                                ▒
>   4.71%  kqemu  qemu-system-x86_64            [.] compress_thread_data_done   
>                                                                               
>                                                                               
>    ▒
>   3.63%  kqemu  qemu-system-x86_64            [.] ring_is_full                
>                                                                               
>                                                                               
>    ▒
>   2.89%  kqemu  qemu-system-x86_64            [.] __ring_is_full              
>                                                                               
>                                                                               
>    ▒
>   2.68%  kqemu  qemu-system-x86_64            [.] 
> threads_submit_request_prepare                                                
>                                                                               
>                                ▒
>   2.60%  kqemu  qemu-system-x86_64            [.] ring_mp_get                 
>                                                                               
>                                                                               
>    ▒
>   2.25%  kqemu  qemu-system-x86_64            [.] ring_get                    
>                                                                               
>                                                                               
>    ▒
>   1.96%  kqemu  libc-2.12.so                  [.] memcpy
> 
> After this patch, the workload is moved to the worker thread, is it
> acceptable?

It depends on your point of view. If you have spare / idle CPUs on the host,
then moving workload to a thread is ok, despite the CPU cost of compression
in that thread being much higher what what was replaced, since you won't be
taking CPU resources away from other contending workloads.

I'd venture to suggest though that we should probably *not* be optimizing for
the case of idle CPUs on the host. More realistic is to expect that the host
CPUs are near fully committed to work, and thus the (default) goal should be
to minimize CPU overhead for the host as a whole. From this POV, zero-page
detection is better than compression due to > x10 better speed.

Given the CPU overheads of compression, I think it has fairly narrow use
in migration in general when considering hosts are often highly committed
on CPU.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 04/12] migration: introduce migration_update_rates, (continued)
- [Qemu-devel] [PATCH 05/12] migration: show the statistics of compression, guangrong . xiao, 2018/06/04
  - Re: [Qemu-devel] [PATCH 05/12] migration: show the statistics of compression, Eric Blake, 2018/06/04
    - Re: [Qemu-devel] [PATCH 05/12] migration: show the statistics of compression, Xiao Guangrong, 2018/06/06
  - Re: [Qemu-devel] [PATCH 05/12] migration: show the statistics of compression, Dr. David Alan Gilbert, 2018/06/13
    - Re: [Qemu-devel] [PATCH 05/12] migration: show the statistics of compression, Xiao Guangrong, 2018/06/14
- [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression, guangrong . xiao, 2018/06/04
  - Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression, Peter Xu, 2018/06/19
    - Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression, Xiao Guangrong, 2018/06/28
    - Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression, Daniel P . Berrangé <=
    - Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression, Xiao Guangrong, 2018/06/28
    - Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression, Dr. David Alan Gilbert, 2018/06/29
    - Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression, Dr. David Alan Gilbert, 2018/06/29
- [Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed, guangrong . xiao, 2018/06/04
  - Re: [Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed, Peter Xu, 2018/06/19
    - Re: [Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed, Xiao Guangrong, 2018/06/28
    - Re: [Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed, Dr. David Alan Gilbert, 2018/06/29
- [Qemu-devel] [PATCH 08/12] migration: do not flush_compressed_data at the end of each iteration, guangrong . xiao, 2018/06/04
- [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer, guangrong . xiao, 2018/06/04
  - Re: [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer, Peter Xu, 2018/06/20

Prev by Date: Re: [Qemu-devel] [PATCH 1/2] sysbus: always allow explicit_ofw_unit_address() to override address generation
Next by Date: [Qemu-devel] [Bug 1779017] Re: qemu-system-arm: crashes raspian kernels with divide-by-zero
Previous by thread: Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression
Next by thread: Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression
Index(es):
- Date
- Thread