qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Poor 8K random IO performance inside the guest


From: Fam Zheng
Subject: Re: [Qemu-devel] Poor 8K random IO performance inside the guest
Date: Fri, 14 Jul 2017 14:07:02 +0800
User-agent: Mutt/1.8.3 (2017-05-23)

On Fri, 07/14 04:28, Nagarajan, Padhu (HPE Storage) wrote:
> During an 8K random-read fio benchmark, we observed poor performance inside
> the guest in comparison to the performance seen on the host block device. The
> table below shows the IOPS on the host and inside the guest with both
> virtioscsi (scsimq) and virtioblk (blkmq).
> 
> -----------------------------------
> config        | IOPS  | fio gst hst
> -----------------------------------
> host-q32-t1   | 79478 | 401     271
> scsimq-q8-t4  | 45958 | 693 639 351
> blkmq-q8-t4   | 49247 | 647 589 308
> -----------------------------------
> host-q48-t1   | 85599 | 559     291
> scsimq-q12-t4 | 50237 | 952 807 358
> blkmq-q12-t4  | 54016 | 885 786 329
> -----------------------------------
> fio gst hst => latencies in usecs, as
>                seen by fio, guest and
>                host block layers.

Out of curisoty, how are gst and hst collected here? It's interesting why hst
(q32-t1) is better than (q8-t4).

> q8-t4 => qdepth=8, numjobs=4
> host  => fio run directly on the host
> scsimq,blkmq => fio run inside the guest
> 
> Shouldn't we get a much better performance inside the guest ?
> 
> When fio inside the guest was generating 32 outstanding IOs, iostat on the
> host shows avgqu-sz of only 16. For 48 outstanding IOs inside the guest,
> avgqu-sz on the host was only marginally better.
> 
> qemu command line: qemu-system-x86_64 -L /usr/share/seabios/ -name
> node1,debug-threads=on -name node1 -S -machine pc,accel=kvm,usb=off -cpu
> SandyBridge -m 7680 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1
> -object iothread,id=iothread1 -object iothread,id=iothread2 -object
> iothread,id=iothread3 -object iothread,id=iothread4 -uuid XX -nographic
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/node1.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
> lsi,id=scsi0,bus=pci.0,addr=0x6 -device
> virtio-scsi-pci,ioeventfd=on,num_queues=4,iothread=iothread2,id=scsi1,bus=pci.0,addr=0x7
> -device
> virtio-scsi-pci,ioeventfd=on,num_queues=4,iothread=iothread2,id=scsi2,bus=pci.0,addr=0x8
> -drive file=rhel7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 -device
> virtio-blk-pci,ioeventfd=on,num-queues=4,iothread=iothread1,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -drive
> file=/dev/sdc,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
> -device
> virtio-blk-pci,ioeventfd=on,num-queues=4,iothread=iothread1,iothread=iothread1,scsi=off,bus=pci.0,addr=0x17,drive=drive-virtio-disk1,id=virtio-disk1

num-queues here will not make much of a difference with current implementation
in QEMU because they all get processed in the same iothread.

> -drive
> file=/dev/sdc,if=none,id=drive-scsi1-0-0-0,format=raw,cache=none,aio=native
> -device
> scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi1-0-0-0,id=scsi1-0-0-0
> -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=XXX,bus=pci.0,addr=0x2 -netdev
> tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device
> virtio-net-pci,netdev=hostnet1,id=net1,mac=YYY,bus=pci.0,multifunction=on,addr=0x15
> -netdev tap,fd=28,id=hostnet2,vhost=on,vhostfd=29 -device
> virtio-net-pci,netdev=hostnet2,id=net2,mac=ZZZ,bus=pci.0,multifunction=on,addr=0x16
> -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on
> 
> fio command line: /tmp/fio --time_based --ioengine=libaio --randrepeat=1
> --direct=1 --invalidate=1 --verify=0 --offset=0 --verify_fatal=0
> --group_reporting --numjobs=$jobs --name=randread --rw=randread --blocksize=8K
> --iodepth=$qd --runtime=60 --filename={/dev/vdb or /dev/sda}
> 
> # qemu-system-x86_64 --version
> QEMU emulator version 2.8.0(Debian 1:2.8+dfsg-3~bpo8+1)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> 
> The guest was running RHEL 7.3 and the host was Debian 8.
> 
> Any thoughts on what could be happening here ?

While there could be things that can be optimized/tuned, the results are not too
surprising to me. You have fast disks here so the overhead is more obvious.

Fam



reply via email to

[Prev in Thread] Current Thread [Next in Thread]