qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion


From: Xiao Guangrong
Subject: Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion
Date: Wed, 1 Nov 2017 14:46:11 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0



On 11/01/2017 12:25 PM, Dan Williams wrote:
On Tue, Oct 31, 2017 at 8:43 PM, Xiao Guangrong
<address@hidden> wrote:


On 10/31/2017 10:20 PM, Dan Williams wrote:

On Tue, Oct 31, 2017 at 12:13 AM, Xiao Guangrong
<address@hidden> wrote:



On 07/27/2017 08:54 AM, Dan Williams wrote:

At that point, would it make sense to expose these special
virtio-pmem areas to the guest in a slightly different way,
so the regions that need virtio flushing are not bound by
the regular driver, and the regular driver can continue to
work for memory regions that are backed by actual pmem in
the host?



Hmm, yes that could be feasible especially if it uses the ACPI NFIT
mechanism. It would basically involve defining a new SPA (System
Phyiscal Address) range GUID type, and then teaching libnvdimm to
treat that as a new pmem device type.



I would prefer a new flush mechanism to a new memory type introduced
to NFIT, e.g, in that mechanism we can define request queues and
completion queues and any other features to make virtualization
friendly. That would be much simpler.


No that's more confusing because now we are overloading the definition
of persistent memory. I want this memory type identified from the top
of the stack so it can appear differently in /proc/iomem and also
implement this alternate flush communication.


For the characteristic of memory, I have no idea why VM should know this
difference. It can be completely transparent to VM, that means, VM
does not need to know where this virtual PMEM comes from (for a really
nvdimm backend or a normal storage). The only discrepancy is the flush
interface.

It's not persistent memory if it requires a hypercall to make it
persistent. Unless memory writes can be made durable purely with cpu
instructions it's dangerous for it to be treated as a PMEM range.
Consider a guest that tried to map it with device-dax which has no
facility to route requests to a special flushing interface.


Can we separate the concept of flush interface from persistent memory?
Say there are two APIs, one is used to indicate the memory type (i.e,
/proc/iomem) and another one indicates the flush interface.

So for existing nvdimm hardwares:
1: Persist-memory + CLFLUSH
2: Persiste-memory + flush-hint-table (I know Intel does not use it)

and for the virtual nvdimm which backended on normal storage:
Persist-memory + virtual flush interface


In what way is this "more complicated"? It was trivial to add support
for the "volatile" NFIT range, this will not be any more complicated
than that.


Introducing memory type is easy indeed, however, a new flush interface
definition is inevitable, i.e, we need a standard way to discover the
MMIOs to communicate with host.

Right, the proposed way to do that for x86 platforms is a new SPA
Range GUID type. in the NFIT.


So this SPA is used for both persistent memory region and flush interface?
Maybe i missed it in previous mails, could you please detail how to do
it?

BTW, please note hypercall is not acceptable for standard, MMIO/PIO regions
are. (Oh, yes, it depends on Paolo. :))






reply via email to

[Prev in Thread] Current Thread [Next in Thread]