[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent
From: |
Roger Pau Monne |
Subject: |
Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available |
Date: |
Wed, 21 Jun 2017 11:50:47 +0100 |
User-agent: |
NeoMutt/20170609 (1.8.3) |
On Wed, Jun 21, 2017 at 11:40:00AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Qemu-devel [mailto:qemu-devel-
> > address@hidden On Behalf Of Paul Durrant
> > Sent: 21 June 2017 10:36
> > To: Roger Pau Monne <address@hidden>; Stefano Stabellini
> > <address@hidden>
> > Cc: Kevin Wolf <address@hidden>; address@hidden; qemu-
> > address@hidden; Max Reitz <address@hidden>; Anthony Perard
> > <address@hidden>; address@hidden
> > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> > persistent if grant copy is not available
> >
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 21 June 2017 10:18
> > > To: Stefano Stabellini <address@hidden>
> > > Cc: Paul Durrant <address@hidden>; xen-
> > address@hidden;
> > > address@hidden; address@hidden; Anthony Perard
> > > <address@hidden>; Kevin Wolf <address@hidden>; Max
> > Reitz
> > > <address@hidden>
> > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> > grant
> > > copy is not available
> > >
> > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > > If grant copy is available then it will always be used in preference
> > > > > to
> > > > > persistent maps. In this case feature-persistent should not be
> > advertized
> > > > > to the frontend, otherwise it may needlessly copy data into
> > > > > persistently
> > > > > granted buffers.
> > > > >
> > > > > Signed-off-by: Paul Durrant <address@hidden>
> > > >
> > > > CC'ing Roger.
> > > >
> > > > It is true that using feature-persistent together with grant copies is a
> > > > a very bad idea.
> > > >
> > > > But this change enstablishes an explicit preference of
> > > > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > > > is not obvious to me that it should be the case.
> > > >
> > > > Why is feature_grant_copy (without feature-persistent) better than
> > > > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > > > avoid grant copies to copy data to persistent grants?
> > >
> > > When using persistent grants the frontend must always copy data from
> > > the buffer to the persistent grant, there's no way to avoid this.
> > >
> > > Using grant_copy we move the copy from the frontend to the backend,
> > > which means the CPU time of the copy is accounted to the backend. This
> > > is not ideal, but IMHO it's better than persistent grants because it
> > > avoids keeping a pool of mapped grants that consume memory and make
> > > the code more complex.
> > >
> > > Do you have some performance data showing the difference between
> > > persistent grants vs grant copy?
> > >
> >
> > No, but I can get some :-)
> >
> > For a little background... I've been trying to push throughput of fio
> > running in
> > a debian stretch guest on my skull canyon NUC. When I started out, I was
> > getting ~100MBbs. When I finished, with this patch, the IOThreads one, the
> > multi-page ring one and a bit of hackery to turn off all the aio flushes
> > that
> > seem to occur even if the image is opened with O_DIRECT, I was getting
> > ~960Mbps... which is about line rate for the SSD in the in NUC.
> >
> > So, I'll force use of persistent grants on and see what sort of throughput I
> > get.
>
> A quick test with grant copy forced off (causing persistent grants to be
> used)... My VM is debian stretch using a 16 page shared ring from blkfront.
> The image backing xvdb is a fully inflated 10G qcow2.
>
> address@hidden:~# fio --randrepeat=1 --ioengine=libaio --direct=0
> --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64
> --size=10G --readwrite=randwrite --ramp_time=4
> test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio,
> iodepth=64
> fio-2.16
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta
> 00m:05s]
> test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
> write: io=6146.6MB, bw=795905KB/s, iops=1546, runt= 7908msec
> cpu : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
> >=64=0.0%
> issued : total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s,
> mint=7908msec, maxt=7908msec
>
> Disk stats (read/write):
> xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068,
> util=98.26%
>
> The dom0 cpu usage for the relevant IOThread was ~60%
>
> The same test with grant copy...
>
> address@hidden:~# fio --randrepeat=1 --ioengine=libaio --direct=0
> --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64
> --size=10G --readwrite=randwrite --ramp_time=4
> test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio,
> iodepth=64
> fio-2.16
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops] [eta
> 00m:05s]
> test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017
> write: io=6232.0MB, bw=810976KB/s, iops=1575, runt= 7869msec
> cpu : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=164.6%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
> >=64=0.0%
> issued : total=r=0/w=12401/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s, maxb=810975KB/s,
> mint=7869msec, maxt=7869msec
>
> Disk stats (read/write):
> xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500, in_queue=5415080,
> util=98.27%
>
> So, higher throughput and iops. The dom0 cpu usage was running at ~70%, so
> there is definitely more dom0 overhead by using grant copy. The usage of
> grant copy could probably be improved through since the current code issues
> an copy ioctl per ioreq. With some batching I suspect some, if not all, of
> the extra overhead could be recovered.
There's almost always going to be more CPU overhead with grant-copy,
since when using persistent grants QEMU can avoid all (or almost all)
of the ioctls to the grant device.
For the persistent-grants benchmark, did you warm up the grant cache
first? (ie: are those results from a first run of fio?)
In any case, I'm happy to use something different than persistent
grants as long as the performance is similar.
Roger.
[Qemu-devel] [PATCH 3/3] xen-disk: use an IOThread per instance, Paul Durrant, 2017/06/20
[Qemu-devel] [PATCH 2/3] xen-disk: add support for multi-page shared rings, Paul Durrant, 2017/06/20