qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/8] block: prepare bdrv_co_do_write_zeroes to d


From: Denis V. Lunev
Subject: Re: [Qemu-devel] [PATCH 1/8] block: prepare bdrv_co_do_write_zeroes to deal with large bl.max_write_zeroes
Date: Mon, 5 Jan 2015 14:48:11 +0300
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 05/01/15 14:23, Peter Lieven wrote:
On 05.01.2015 12:06, Denis V. Lunev wrote:
On 05/01/15 10:34, Peter Lieven wrote:
On 30.12.2014 10:20, Denis V. Lunev wrote:
bdrv_co_do_write_zeroes split writes using bl.max_write_zeroes or
16 MiB as a chunk size. This is implemented in this way to tolerate
buggy block backends which do not accept too big requests.

Though if the bdrv_co_write_zeroes callback is not good enough, we
fallback to write data explicitely using bdrv_co_writev and we
create buffer to accomodate zeroes inside. The size of this buffer
is the size of the chunk. Thus if the underlying layer will have
bl.max_write_zeroes high enough, f.e. 4 GiB, the allocation can fail.

Actually, there is no need to allocate such a big amount of memory.
We could simply allocate 1 MiB buffer and create iovec, which will
point to the same memory.

Signed-off-by: Denis V. Lunev <address@hidden>
CC: Kevin Wolf <address@hidden>
CC: Stefan Hajnoczi <address@hidden>
CC: Peter Lieven <address@hidden>
---
  block.c | 35 ++++++++++++++++++++++++-----------
  1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 4165d42..d69c121 100644
--- a/block.c
+++ b/block.c
@@ -3173,14 +3173,18 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
   * of 32768 512-byte sectors (16 MiB) per request.
   */
  #define MAX_WRITE_ZEROES_DEFAULT 32768
+/* allocate iovec with zeroes using 1 MiB chunks to avoid to big allocations */
+#define MAX_ZEROES_CHUNK (1024 * 1024)
static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
      int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
  {
      BlockDriver *drv = bs->drv;
      QEMUIOVector qiov;
-    struct iovec iov = {0};
      int ret = 0;
+    void *chunk = NULL;
+
+    qemu_iovec_init(&qiov, 0);
        int max_write_zeroes = bs->bl.max_write_zeroes ?
bs->bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT; @@ -3217,27 +3221,35 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
          }
            if (ret == -ENOTSUP) {
+            int64_t num_bytes = (int64_t)num << BDRV_SECTOR_BITS;
+            int chunk_size = MIN(MAX_ZEROES_CHUNK, num_bytes);
+
/* Fall back to bounce buffer if write zeroes is unsupported */
-            iov.iov_len = num * BDRV_SECTOR_SIZE;
-            if (iov.iov_base == NULL) {
- iov.iov_base = qemu_try_blockalign(bs, num * BDRV_SECTOR_SIZE);
-                if (iov.iov_base == NULL) {
+            if (chunk == NULL) {
+                chunk = qemu_try_blockalign(bs, chunk_size);
+                if (chunk == NULL) {
                      ret = -ENOMEM;
                      goto fail;
                  }
-                memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE);
+                memset(chunk, 0, chunk_size);
+            }
+
+            while (num_bytes > 0) {
+                int to_add = MIN(chunk_size, num_bytes);
+                qemu_iovec_add(&qiov, chunk, to_add);

This can and likely will fail for big num_bytes if you exceed IOV_MAX vectors.

I would stick to the old method and limit the num to a reasonable value e.g. MAX_WRITE_ZEROES_DEFAULT. This becomes necessary as you set INT_MAX for max_write_zeroes. That hasn't been considered before in
the original patch.

Peter


hmm. You are right, but I think that it would be better to limit iovec size to 32 and this will solve the problem. Allocation of 32 Mb could be a real problem
on loaded system could be a problem.

What do you think on this? May be we could consider 16 as a limit...

I would do the following:

---8<---

From 8c2a08baddbcd9e89bbb11fa83a42350bd7cc095 Mon Sep 17 00:00:00 2001
From: Peter Lieven <address@hidden>
Date: Mon, 5 Jan 2015 12:14:52 +0100
Subject: [PATCH] block: limited request size in write zeroes unsupported path

If bs->bl.max_write_zeroes is large and we end up in the unsupported
path we might allocate a lot of memory for the iovector and/or even
generate an oversized requests.

Fix this by limiting the request by the minimum of the reported
maximum transfer size or 16MB (32768 sectors).

Reported-by: Denis V. Lunev <address@hidden>
Signed-off-by: Peter Lieven <address@hidden>
---
 block.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index a612594..8009478 100644
--- a/block.c
+++ b/block.c
@@ -3203,6 +3203,9 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,

         if (ret == -ENOTSUP) {
/* Fall back to bounce buffer if write zeroes is unsupported */
+            int max_xfer_len = MIN_NON_ZERO(bs->bl.max_transfer_length,
+ MAX_WRITE_ZEROES_DEFAULT);
+            num = MIN(num, max_xfer_len);
this is not going to work IMHO. num is the number in sectors.
max_xfer_len is in bytes.

I will send my updated version using your approach in a
couple of minutes. Would like to test it a bit.

iov.iov_len = num * BDRV_SECTOR_SIZE;
             if (iov.iov_base == NULL) {
iov.iov_base = qemu_try_blockalign(bs, num * BDRV_SECTOR_SIZE); @@ -3219,7 +3222,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
             /* Keep bounce buffer around if it is big enough for all
              * all future requests.
              */
-            if (num < max_write_zeroes) {
+            if (num < max_xfer_len) {
                 qemu_vfree(iov.iov_base);
                 iov.iov_base = NULL;
             }




reply via email to

[Prev in Thread] Current Thread [Next in Thread]