qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backe


From: Marcel Apfelbaum
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 20:58:32 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 01/02/2018 20:51, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 08:31:09PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 20:21, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
>>>> On 01/02/2018 15:53, Eduardo Habkost wrote:
>>>>> On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
>>>>>> On 01/02/2018 14:10, Eduardo Habkost wrote:
>>>>>>> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
>>>>>>>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
>>>>>>> [...]
>>>>>>>>>> BTW, what's the root cause for requiring HVAs in the buffer?
>>>>>>>>>
>>>>>>>>> It's a side effect of the kernel/userspace API which always wants
>>>>>>>>> a single HVA/len pair to map memory for the application.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Eduardo and Michael,
>>>>>>>>
>>>>>>>>>>  Can
>>>>>>>>>> this be fixed?
>>>>>>>>>
>>>>>>>>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
>>>>>>>>> mapping an s/g list with actual memory. The HVA/len pair would then 
>>>>>>>>> just
>>>>>>>>> be used to refer to the region, without creating the two mappings.
>>>>>>>>>
>>>>>>>>> Something like splitting the register mr into
>>>>>>>>>
>>>>>>>>> mr = create mr (va/len) - allocate a handle and record the va/len
>>>>>>>>>
>>>>>>>>> addmemory(mr, offset, hva, len) - pin memory
>>>>>>>>>
>>>>>>>>> register mr - pass it to HW
>>>>>>>>>
>>>>>>>>> As a nice side effect we won't burn so much virtual address space.
>>>>>>>>>
>>>>>>>>
>>>>>>>> We would still need a contiguous virtual address space range (for 
>>>>>>>> post-send)
>>>>>>>> which we don't have since guest contiguous virtual address space
>>>>>>>> will always end up as non-contiguous host virtual address space.
>>>>>>>>
>>>>>>>> I am not sure the RDMA HW can handle a large VA with holes.
>>>>>>>
>>>>>>> I'm confused.  Why would the hardware see and care about virtual
>>>>>>> addresses? 
>>>>>>
>>>>>> The post-send operations bypasses the kernel, and the process
>>>>>> puts in the work request GVA addresses.
>>>>>>
>>>>>>> How exactly does the hardware translates VAs to
>>>>>>> PAs? 
>>>>>>
>>>>>> The HW maintains a page-directory like structure different form MMU
>>>>>> VA -> phys pages
>>>>>>
>>>>>>> What if the process page tables change?
>>>>>>>
>>>>>>
>>>>>> Since the page tables the HW uses are their own, we just need the phys
>>>>>> page to be pinned.
>>>>>
>>>>> So there's no hardware-imposed requirement that the hardware VAs
>>>>> (mapped by the HW page directory) match the VAs in QEMU
>>>>> address-space, right? 
>>>>
>>>> Actually there is. Today it works exactly as you described.
>>>
>>> Are you sure there's such hardware-imposed requirement?
>>>
>>
>> Yes.
>>
>>> Why would the hardware require VAs to match the ones in the
>>> userspace address-space, if it doesn't use the CPU MMU at all?
>>>
>>
>> It works like that:
>>
>> 1. We register a buffer from the process address space
>>    giving its base address and length.
>>    This call goes to kernel which in turn pins the phys pages
>>    and registers them with the device *together* with the base
>>    address (virtual address!)
>> 2. The device builds its own page tables to be able to translate
>>    the virtual addresses to actual phys pages.
> 
> How would the device be able to do that?  It would require the
> device to look at the process page tables, wouldn't it?  Isn't
> the HW IOVA->PA translation table built by the OS?
> 

As stated above, these are tables private for the device.
(They even have a hw vendor specific layout I think,
 since the device holds some cache)

The device looks at its own private page tables, and not
to the OS ones.

> 
>> 3. The process executes post-send requests directly to hw by-passing
>>    the kernel giving process virtual addresses in work requests.
>> 4. The device uses its own page tables to translate the virtual
>>    addresses to phys pages and sending them.
>>
>> Theoretically is possible to send any contiguous IOVA instead of
>> process's one but is not how is working today.
>>
>> Makes sense?
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]