qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] Tests with the latest ppc repo


From: Mark Cave-Ayland
Subject: Re: [Qemu-ppc] Tests with the latest ppc repo
Date: Tue, 13 Jan 2015 12:30:09 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0

On 13/01/15 12:04, Kevin Wolf wrote:

>>>> 2) The fix for mixing synchronous and asynchronous requests in the macio
>>>> code corrupts a fresh install of my Darwin test ISO when running through
>>>> the complete installation:
>>>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=3e300fa6ad4ee19b16339c25773dec8df0bfb982
>>>
>>> First of all, the commits message says:
>>>
>>>     However, the block infrastructure changed below our feet and now
>>>     it's impossible to call a synchronous block read/write from the aio
>>>     callback handler of a previous block access.
>>>
>>> That's news for me. If it's true, this is probably not intentional at
>>> all and block layer code should have been fixed instead.
>>>
>>> What kind of failure do you get when you try such a call?
>>
>> The original patch was supplied by Alex based on a report from John
>> which I suspect may have been provided off-list (see
>> http://lists.gnu.org/archive/html/qemu-ppc/2014-05/msg00461.html) and
>> was not something that I was seeing locally at the time.
> 
> Perhaps Alex can remember the details, CCed.
> 
>>>> So far I don't see anything obvious with the above commit except that
>>>> the multi-write combining seems to be more aggressive afterwards.
>>>
>>> Maybe I'm missing something, but I don't see any combining in this code?
>>
>> This is something that I found by enabling tracepoints for both the good
>> and bad commits and trying to compare the differences. It looks like
>> when the unaligned requests were submitted previously, e.g. issue the
>> bulk request via dma_blk_read()/dma_blk_write() followed by the
>> remainder with bdrv_read()/bdrv_write() then the block layer disk
>> requests would look fairly similar to the DMA engine requests.
>>
>> When the above commit changed the unaligned accesses from
>> bdrv_read()/bdrv_write() into bdrv_aio_readv()/bdrv_aio_writev() as used
>> internally by dma_blk_read() and dma_blk_write(), then individual DMA
>> read/write requests would start getting merged together, presumably by
>> the request-merging code in block.c.
>>
>> Note that this is on a laptop with a slow internal HD (non-SSD) with
>> encryption enabled and so could this perhaps could be encouraging more
>> request merging?
> 
> block.c only merges writes when bdrv_aio_multiwrite() is explicitly
> called. The only caller of that function is virtio-blk, so I would rule
> that out.

Hmmm okay so I guess it can't be that.

> Did you check that it isn't the guest who already sends different
> requests?

Yes, the requests look the same here. Darwin appears to have a limit of
65534 bytes per DMA request, so a 64K request gets split into 65534
bytes + 2 bytes, and larger requests then get split up to this limit as
required. It was tending to be the larger aligned dma_blk_read() and
dma_blk_write() requests that would get combined in this pattern.

>>>> I'm wondering whether or not the synchronous requests for the
>>>> unaligned accesses had a side effect of introducing read/write
>>>> barriers which have now been removed?
>>>
>>> Let's start a bit simpler: While a request is running, its iov is in
>>> use. pmac_ide_transfer_cb() seems to issue an AIO request and
>>> immediately continue working on the same iov. I wouldn't be surprised if
>>> some internal data structures in the macio driver weren't made for
>>> multiple requests at the same time either.
>>>
>>> The usual pattern is that after issuing an AIO command, you return, and
>>> only continue what you were doing in its callback.
>>>
>>> That said, you have so many operations in pmac_ide_transfer_cb() using
>>> the function recursively as their callback that I find it really hard to
>>> read by now. Perhaps it would benefit from a conversion to coroutines,
>>> which would make the code linear again.
>>
>> Yeah, it's a fairly horrible piece of code which is why I've been
>> spending time trying to fix it up :/  I do have an alpha quality patch
>> that separates the DMA and disk logic back out again, but it also
>> suffers from the same problem as the existing code in that while the
>> installation completes the resulting disk image is corrupted in a
>> seemingly similar way.
>>
>> I can temporarily push this code, which I believe is easier to read, to
>> my github account if that helps?
> 
> Yes, please. More information can only help.

Okay. I've pushed the experimental branch to
https://github.com/mcayland/qemu/tree/for-kevin (note that this also
contains the PPC loadvm/savevm fixes which I was hoping would help save
time recreating the bug, but in the end I couldn't isolate down to one
particular part of the installation process).

>> Another solution to explore could be to move the AIO context over to
>> using offset and length rather than sector_num and sector count so that
>> the existing code for allowing 512 byte sector accesses on 4096 byte
>> sector devices can be used. The caller could potentially provide an
>> "allow unaligned accesses" flag when creating the AIO context, and then
>> all that would remain would be to create custom DMA helpers for macio,
>> similar to what I've tried to do in my experimental patch.
> 
> Do we really need a flag, or is it just that we need to move
> dma-helpers.c from sectory granularity to byte granularity?

Yes that would be absolutely ideal! The idea of a flag was really to
ensure that developers find out quickly if they are accidentally issuing
unaligned requests on platforms that don't need them, and so
accidentally incur the additional overhead, particularly on writes.

It struck me from perusing the source at the time that a lot of the
non-disk backends (e.g. http) would also benefit greatly from switching
to byte-level granularity as there is a lot of converting to/from
offsets and sector numbers.


ATB,

Mark.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]