qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [RFC PATCH 00/40] Sneak peek of virtio and dataplane ch


From: Paolo Bonzini
Subject: Re: [Qemu-block] [RFC PATCH 00/40] Sneak peek of virtio and dataplane changes for 2.6
Date: Thu, 26 Nov 2015 11:39:20 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


On 26/11/2015 10:36, Christian Borntraeger wrote:
> For some unknown reason, this seems to be slightly slower than 2.5-rc1 on my
> old z196. (have not net tested the z13)
> 
> your branch is certainly better regarding malloc, but worse regarding others.

Thanks for taking the time to test this!

This is correct, see the cover letter:

"[Patches 14 to 16 remove] the duplicate dataplane-specific
implementation of virtio in favor of the regular one that is already
used for non-dataplane. While the dataplane implementation is slightly
more optimized, I chose to keep the other one to avoid another "touch
all virtio devices" series.

Patch 10 alone mostly brings performance in par between the two.
The remaining 7-8% can be recovered by mostly getting rid of tiny
address_space_* operations, keeping the rings always mapped. Note that
the rest of this big series does bring a little performance improvement,
and already makes up for the lost performance."

The profile shows that the culprit is the repeated access
to the virtio ring:

3.99% qemu-system-s39 libc-2.18.so [.] __memcpy_z196
2.66% qemu-system-s39 qemu-system-s390x [.] address_space_lduw_le
2.51% qemu-system-s39 qemu-system-s390x [.] address_space_map
2.51% qemu-system-s39 qemu-system-s390x [.] phys_page_find
2.24% qemu-system-s39 qemu-system-s390x [.] qemu_get_ram_ptr
2.18% qemu-system-s39 qemu-system-s390x [.] address_space_translate_internal
1.91% qemu-system-s39 qemu-system-s390x [.] qemu_coroutine_switch
1.66% qemu-system-s39 qemu-system-s390x [.] address_space_rw
1.63% qemu-system-s39 qemu-system-s390x [.] address_space_stw_le
1.57% qemu-system-s39 qemu-system-s390x [.] address_space_stl_le
1.57% qemu-system-s39 qemu-system-s390x [.] address_space_translate
1.45% qemu-system-s39 qemu-system-s390x [.] virtqueue_pop
0.91% qemu-system-s39 qemu-system-s390x [.] qemu_ram_block_from_host
0.79% qemu-system-s39 qemu-system-s390x [.] vring_desc_read
0.76% qemu-system-s39 qemu-system-s390x [.] qemu_get_ram_block
-----------
28.33%

3.30% qemu-system-s39 libc-2.18.so [.] __memcpy_z196
2.83% qemu-system-s39 qemu-system-s390x [.] memory_region_find_rcu
2.72% qemu-system-s39 qemu-system-s390x [.] vring_pop
1.37% qemu-system-s39 qemu-system-s390x [.] address_space_rw
1.37% qemu-system-s39 qemu-system-s390x [.] qemu_get_ram_ptr
1.18% qemu-system-s39 qemu-system-s390x [.] memory_region_find
0.92% qemu-system-s39 qemu-system-s390x [.] get_desc.isra.11
0.92% qemu-system-s39 qemu-system-s390x [.] qemu_ram_block_from_host
0.84% qemu-system-s39 qemu-system-s390x [.] vring_push
-----------
15.45%

I would really prefer to get rid of vring.c as soon as the infrastructure
makes it possible---even if it's faster. We know what makes virtio.c
slower, and it's simpler to fix virtio.c than to convert all the other
models to vring.c _plus_ make vring.c safe for migration.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]