[Qemu-devel] [PATCH RFC 00/22] I/O prefetch cache

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH RFC 00/22] I/O prefetch cache

From:	Pavel Butsykin
Subject:	[Qemu-devel] [PATCH RFC 00/22] I/O prefetch cache
Date:	Thu, 25 Aug 2016 16:43:59 +0300

The prefetch cache aims to improve the performance of sequential read data.
Of most interest here are the requests of a small size of data for sequential
read, such requests can be optimized by extending them and moving into 
the prefetch cache. However, there are 2 issues:
 - In aggregate only a small portion of requests is sequential, so delays caused
   by the need to read more volumes of data will lead to an overall decrease
   in performance.
 - The presence of redundant data in the cache memory with a large number of
   random requests.
This pcache implementation solves the above and other problems prefetching data.
The pcache algorithm can be summarised by the following main steps.

1. Monitor I/O requests to identify typical sequences.
This implementation of prefetch cache works at the storage system level and has 
information only about the physical block addresses of I/O requests. Statistics 
are collected only from read requests to a maximum size of 32kb(by default),
each request that matches the criteria falls into a pool of requests. In order
to store requests statistic used by the rb-tree(lreq.tree), it's simple but for
this issue a quite efficient data structure.

2. Identifying sequential I/O streams.
For each read request to be carried out attempting to lift the chain sequence 
from lreq.tree, where this request will be element of a sequential chain of 
requests. The key to search for consecutive requests is the area of sectors 
preceding the current request. The size of this area should not be too small to 
avoid false readahead. The sequential stream data requests can be identified
even when a large number of random requests. For example, if there is access to 
the blocks 100, 1157, 27520, 4, 101, 312, 1337, 102, in the context of request
processing 102 will be identified the chain of sequential requests 100, 101. 102
and then should a decision be made to do readahead. Also a situation may arise
when multiple applications A, B, C simultaneously perform sequential read of
data. For each separate application that will be sequential read data 
A(100, 101, 102), B(300, 301, 302), C(700, 701, 702), but for block devices it 
may look like a random data reading: 100,300,700,101,301,701,102,302,702. 
In this case, the sequential streams will also be recognised because location
requests in the rb-tree will allow to separate the sequential I/O streams.

3. Do the readahead into the cache for recognized sequential data streams.
After the issue of the detection of pcache case was resolved, need using larger 
requests to bring data into the cache. In this implementation the pcache used
readahead instead of the extension request, therefore the request goes as is. 
There is not any reason to put data in the cache that will never be picked up, 
but this will always happen in the case of extension requests. In order to store
areas of cached blocks is also used by the rb-tree(pcache.tree), it's simple but
for this issue a quite efficient data structure.

4. Control size of the prefetch cache pool and the requests statistic pool
For control the border of the pool statistic of requests, the data of requests 
are placed and replaced according to the FIFO principle, everything is simple.
For control the boundaries of the memory cache used LRU list, it allows to limit
the max amount memory that we can allocate for pcache. But the LRU is there
mainly to prevent displacement of the cache blocks that was read partially. 
The main way the memory is pushed out immediately after use, as soon as a chunk
of memory from the cache has been completely read, since the probability of
repetition of the request is very low. Cases when one and the same portion of
the cache memory has been read several times are not optimized and do not apply
to the cases that can optimize the pcache. Thus, using a cache memory of small
volume, by the optimization of the operations read-ahead and clear memory, we
can read entire volumes of data, providing a 100% cache hit. Also does not
decrease the effectiveness of random read requests.

PCache is implemented as a qemu block filter driver, has some configurable
parameters, such as: size of total cache, readahead size, maximum size of block
that can be processed.

For performance evaluation has been used several test cases with different
sequential and random read data on SSD disk. Here are the results of tests and
qemu parameters:

qemu parameters: 
-M pc-i440fx-2.4 --enable-kvm -smp 4 -m 1024 
-drive file=centos7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,
       aio=native,pcache-full-size=4MB,pcache-readahead-size=128KB,
       pcache-max-aio-size=32KB
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,
        id=virtio-disk0
(-set device.virtio-disk0.x-data-plane=on)

********************************************************************************
* Testcase                        * Results in iops                            *
*                                 **********************************************
*                                 * clean qemu   * pcache       * x-data-plane *
********************************************************************************
* Create/open 16 file(s) of total * 25514 req/s  * 85659 req/s  * 28249 req/s  *
* size 2048.00 MB named           * 25692 req/s  * 89064 req/s  * 27950 req/s  *
* /tmp/tmp.tmp, start 4 thread(s) * 25836 req/s  * 84142 req/s  * 28120 req/s  *
* and do uncached sequential read *              *              *              *
* by 4KB blocks                   *              *              *              *
********************************************************************************
* Create/open 16 file(s) of total * 56006 req/s  * 92137 req/s  * 56992 req/s  *
* size 2048.00 MB named           * 55335 req/s  * 92269 req/s  * 57023 req/s  *
* /tmp/tmp.tmp, start 4 thread(s) * 55731 req/s  * 98722 req/s  * 56593 req/s  *
* and do uncached sequential read *              *              *              *
* by 4KB blocks with constant     *              *              *              *
********************************************************************************
* Create/open 16 file(s) of total * 14104 req/s  * 14164 req/s  * 13914 req/s  *
* size 2048.00 MB named           * 14130 req/s  * 14232 req/s  * 13613 req/s  *
* /tmp/tmp.tmp, start 4 thread(s) * 14183 req/s  * 14080 req/s  * 13374 req/s  *
* and do uncached random read by  *              *              *              *
* 4KB blocks                      *              *              *              *
********************************************************************************
* Create/open 16 file(s) of total * 23480 req/s  * 23483 req/s  * 20887 req/s  *
* size 2048.00 MB named           * 23070 req/s  * 22432 req/s  * 21127 req/s  *
* /tmp/tmp.tmp, start 4 thread(s) * 24090 req/s  * 23499 req/s  * 23415 req/s  *
* and do uncached random read by  *              *              *              *
* 4KB blocks with constant queue  *              *              *              *
* len 32                          *              *              *              *
********************************************************************************

TODO list:
- add tracepoints
- add migration support 
- add more explanations in the commit messages
- get rid of the additional allocation in pcache_node_find_and_create() and
  pcache_aio_readv()

Pavel Butsykin (22):
  block/pcache: empty pcache driver filter
  block/pcache: add own AIOCB block
  util/rbtree: add rbtree from linux kernel
  block/pcache: add pcache debug build
  block/pcache: add aio requests into cache
  block/pcache: restrict cache size
  block/pcache: introduce LRU as method of memory
  block/pcache: implement pickup parts of the cache
  block/pcache: separation AIOCB on requests
  block/pcache: add check node leak
  add QEMU style defines for __sync_add_and_fetch
  block/pcache: implement read cache to qiov and drop node during aio
    write
  block/pcache: add generic request complite
  block/pcache: add support for rescheduling requests
  block/pcache: simple readahead one chunk forward
  block/pcache: pcache readahead node around
  block/pcache: skip readahead for non-sequential requests
  block/pcache: add pcache skip large aio read
  block/pcache: add pcache node assert
  block/pcache: implement pcache error handling of aio cb
  block/pcache: add write through node
  block/pcache: drop used pcache node

 block/Makefile.objs             |    1 +
 block/pcache.c                  | 1224 +++++++++++++++++++++++++++++++++++++++
 include/qemu/atomic.h           |    3 +
 include/qemu/rbtree.h           |  109 ++++
 include/qemu/rbtree_augmented.h |  237 ++++++++
 util/Makefile.objs              |    1 +
 util/rbtree.c                   |  570 ++++++++++++++++++
 7 files changed, 2145 insertions(+)
 create mode 100644 block/pcache.c
 create mode 100644 include/qemu/rbtree.h
 create mode 100644 include/qemu/rbtree_augmented.h
 create mode 100644 util/rbtree.c

-- 
2.8.3

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH RFC 00/22] I/O prefetch cache, Pavel Butsykin <=
- [Qemu-devel] [PATCH RFC 04/22] block/pcache: add pcache debug build, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 02/22] block/pcache: add own AIOCB block, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 01/22] block/pcache: empty pcache driver filter, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 07/22] block/pcache: introduce LRU as method of memory, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 10/22] block/pcache: add check node leak, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 12/22] block/pcache: implement read cache to qiov and drop node during aio write, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 15/22] block/pcache: simple readahead one chunk forward, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 08/22] block/pcache: implement pickup parts of the cache, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 22/22] block/pcache: drop used pcache node, Pavel Butsykin, 2016/08/25
- [Qemu-devel] [PATCH RFC 11/22] add QEMU style defines for __sync_add_and_fetch, Pavel Butsykin, 2016/08/25

Prev by Date: Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
Next by Date: [Qemu-devel] [PATCH RFC 04/22] block/pcache: add pcache debug build
Previous by thread: [Qemu-devel] [RFC PATCH v2 00/12] Guest startup time optimization
Next by thread: [Qemu-devel] [PATCH RFC 04/22] block/pcache: add pcache debug build
Index(es):
- Date
- Thread