qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/5] Add target memory mapping API


From: Avi Kivity
Subject: Re: [Qemu-devel] [PATCH 1/5] Add target memory mapping API
Date: Mon, 19 Jan 2009 20:29:40 +0200
User-agent: Thunderbird 2.0.0.19 (X11/20090105)

Ian Jackson wrote:
Efficient read-modify-write may be very hard for some setups to
achieve.  It can't be done with the bounce buffer implementation.
I think ond good rule of thumb would be to make sure that the interface
as specified can be implemented in terms of cpu_physical_memory_rw.
What is the motivation for efficient rmw?

I think you've misunderstood me.  I don't think there is such a
motivation.  I was saying it was so difficult to implement that we
might as well exclude it.

Then we agree. The map API is for read OR write operations, not both at the same time.

That would be one alternative but isn't it the case that (for example)
with a partial DMA completion, the guest can assume that the
supposedly-untouched parts of the DMA target memory actually remain
untouched rather than (say) zeroed ?
For block devices, I don't think it can.

`Block devices' ?  We're talking about (say) IDE controllers here.  I
would be very surprised if an IDE controller used DMA to overwrite RAM
beyond the amount of successful transfer.

If a Unix variant does zero copy IO using DMA direct into process
memory space, then it must even rely on the IDE controller not doing
DMA beyond the end of the successful transfer, as the read(2) API
promises to the calling process that data beyond the successful read
is left untouched.

And even if the IDE spec happily says that the (IDE) host (ie our
guest) is not allowed to assume that that memory (ie the memory beyond
the extent of the successful part of a partially successful transfer)
is unchanged, there will almost certainly be some other IO device on
some some platform that will make that promise.

So we need a call into the DMA API from the device model to say which
regions have actually been touched.


It's not possible to implement this efficiently. The qemu block layer will submit the results of the map operation to the kernel in an async zero copy operation. The kernel may break up this operation into several parts (if the underlying backing store is fragmented) and submit in parallel to the underlying device(s). Those requests will complete out-of-order, so you can't guarantee that if an error occurs all memory before will have been written and none after.

I really doubt that any guest will be affected by this. It's a tradeoff between decent performance and needlessly accurate emulation. I don't see how we can choose the latter.

In a system where we're trying to do zero copy, we may issue the map
request for a large transfer, before we know how much the host kernel
will actually provide.
Won't it be at least 1GB?  Partition you requests to that size.

No, I mean, before we know how much data qemu's read(2) will transfer.

You don't know afterwards either. Maybe read() is specced as you say, but practical implementations will return the minimum bytes read, not exact.

Think software RAID.

 In any case, this will only occur with mmio.  I don't think the
guest can assume much in such cases.

No, it won't only occur with mmio.

In the initial implementation in Xen, we will almost certainly simply
emulate everything with cpu_physical_memory_rw.  So it will happen all
the time.

Try it out. I'm sure it will work just fine (if incredibly slowly, unless you provide multiple bounce buffers).

Err, no, I don't really see that.  In my proposal the `handle' is
actually allocated by the caller.  The implementation provides the
private data and that can be empty.  There is no additional memory
allocation.
You need to store multiple handles (one per sg element), so you need to allocate a variable size vector for it. Preallocation may be possible but perhaps wasteful.

See my reply to Anthony Ligouri, which shows how this can be avoided.
Since you hope for a single call to map everything, you can do an sg
list with a single handle.

That's a very different API.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]