Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drai

From:	Fam Zheng
Subject:	Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin
Date:	Wed, 7 Feb 2018 22:14:05 +0800

On Wed, Feb 7, 2018 at 9:10 PM, Kevin Wolf <address@hidden> wrote:
> Am 07.02.2018 um 13:39 hat Fam Zheng geschrieben:
>> On Wed, Feb 7, 2018 at 6:51 PM, Kevin Wolf <address@hidden> wrote:
>> > Am 07.02.2018 um 02:48 hat Fam Zheng geschrieben:
>> >> On Tue, Feb 6, 2018 at 11:32 PM, Kevin Wolf <address@hidden> wrote:
>> >> > I got a bug assigned where we have a large (200 GB) fragmented qcow2
>> >> > image, and qemu-img convert takes two hours before it even starts to
>> >> > copy data. What it does in this time is iterating through the whole
>> >> > image with bdrv_block_status_above() in order to estimate the work to be
>> >> > done (and of course it will repeat the same calls while actually copying
>> >> > data).
>> >>
>> >> The convert_iteration_sectors loop looks wasteful. Why cannot we
>> >> report progress simply based on (offset/size) * 100% so we don't need
>> >> to do this estimation?
>> >
>> > The assumption is that it's quick and it makes the progress much more
>> > meaningful. You know those progress bars that slowly crawl towards 50%
>> > and then suddenly complete within a second. Or the first 20% are quick,
>> > but then things get really slow. They are not a great example.
>> >
>> > There must have been something wrong with that image file that was
>> > reported, because they reported that it was the only image causing
>> > trouble and if they copied it away, it became quick, too.
>> >
>> > Even for a maximally fragmented 100 GB qcow2 image it only takes about a
>> > second on my laptop. So while I don't feel as certain about the loop as
>> > before, in practice it normally doesn't seem to hurt.
>>
>> No doubt about normal cases. I was unsure about corner cases like
>> slow-ish NFS etc.
>
> Yeah, NFS seems to be a bit slow. Not two-hours-slow, but when I tried
> it with a localhost NFS server, the same operation that took two seconds
> directly from the local file system took about 40s over NFS. They seem
> to go over the network for each lseek() instead of caching the file map.
> Maybe something to fix in the NFS kernel driver.
>
>> A little bit of intelligence would be limiting the time for the loop
>> to a few seconds, for example, "IMG_CVT_EST_SCALE *
>> bdrv_getlength(bs)", or a less linear map.
>
> I don't understand. What would IMG_CVT_EST_SCALE be? Isn't the problem
> that this isn't a constant factor but can be anywhere between 0% and
> 100% depending on the specific image?

This is going to be quite arbitrary, just to make sure we don't waste
a very long time on maximal fragmented images, or on slow lseek()
scenarios. Something like this:

#define IMG_CVT_EST_SCALE ((1 << 40) / 30)

    time_t start = time(NULL);
    while (sector_num < s->total_sectors) {
        if (time(NULL) - start > MAX(30, bdrv_getlength() /
IMG_CVT_EST_SCALE)) {
            /* Too much time spent on counting allocation, just fall
back to bdrv_get_allocated_file_size */
            s->allocated_sectors = bdrv_get_allocated_file_size(bs);
            break;
        }
        n = convert_iteration_sectors(s, sector_num);
        ...
    }

So we loop for at most 30 seconds (for >1TB images).

>
>> >> > One of the things I saw is that between each pair of lseek() calls, we
>> >> > also have unnecessary poll() calls, and these were introduced by this
>> >> > patch. If I modify bdrv_common_block_status_above() so that it doesn't
>> >> > poll if data.done == true already, the time needed to iterate through my
>> >> > test image is cut in half.
>> >> >
>> >> > Now, of course, I'm still only seeing a few seconds for a 100 GB image,
>> >> > so there must be more that is wrong for the reporter, but it suggests to
>> >> > me that BDRV_POLL_WHILE() shouldn't be polling unconditionally when only
>> >> > one of its users actually needs this.
>> >>
>> >> Sounds fine to me. Maybe we could add a boolean parameter to
>> >> BDRV_POLL_WHILE?
>> >
>> > Why not call aio_poll() explicitly in the BDRV_POLL_WHILE() condition in
>> > the one place that needs it?
>>
>> Yes, that is better. Do you have a patch? Or do you want me to work on one?
>
> I don't have a patch yet. If you want to write one, be my guest.
>
> Otherwise I'd just take a to-do note for when I get back to my
> bdrv_drain work. There is still one or two series left to do anyway, and
> I think it would fit in there.

Thought that so I asked, I'll leave it to you then. :)

Fam

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin, Kevin Wolf, 2018/02/06
- Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin, Fam Zheng, 2018/02/06
  - Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin, Kevin Wolf, 2018/02/07
    - Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin, Fam Zheng, 2018/02/07
    - Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin, Kevin Wolf, 2018/02/07
    - Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin, Fam Zheng <=

Prev by Date: Re: [Qemu-devel] [RFC PATCH] vfio/pci: Add ioeventfd support
Next by Date: Re: [Qemu-devel] [Qemu-arm] [PATCH v5 04/14] i.MX: Add code to emulate i.MX2 watchdog IP block
Previous by thread: Re: [Qemu-devel] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin
Next by thread: [Qemu-devel] [PULL 00/14] migration queue
Index(es):
- Date
- Thread