|
From: | Paolo Bonzini |
Subject: | Re: [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor |
Date: | Thu, 4 Feb 2016 11:18:03 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
On 04/02/2016 08:48, Gonglei (Arei) wrote: > 11.44% qemu-kvm [.] memory_region_find > 6.31% qemu-kvm [.] qemu_get_ram_ptr > 4.61% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt > 3.54% qemu-kvm [.] qemu_ram_addr_from_host > 2.80% libpthread-2.19.so [.] pthread_mutex_lock > 2.55% qemu-kvm [.] object_unref > 2.49% libc-2.19.so [.] malloc > 2.47% libc-2.19.so [.] _int_malloc > 2.34% libc-2.19.so [.] _int_free > 2.18% qemu-kvm [.] object_ref > 2.18% qemu-kvm [.] address_space_translate > 2.03% libc-2.19.so [.] __memcpy_sse2_unaligned > 1.76% libc-2.19.so [.] malloc_consolidate > 1.56% qemu-kvm [.] addrrange_intersection > 1.52% qemu-kvm [.] vring_pop > 1.36% qemu-kvm [.] find_next_zero_bit > 1.30% [kernel] [k] native_write_msr_safe > 1.29% qemu-kvm [.] addrrange_intersects > 1.21% qemu-kvm [.] vring_map > 0.93% qemu-kvm [.] virtio_notify > > Do you have any thoughts to decrease the cpu overhead and get higher through > output? Thanks! Using bigger chunks than 256 bytes will reduce the overhead in memory_region_find and qemu_get_ram_ptr. You could expect a further 10-12% improvement. Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |