qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2T


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
Date: Thu, 4 Oct 2018 15:16:13 +0100
User-agent: Mutt/1.10.1 (2018-07-13)

* Igor Mammedov (address@hidden) wrote:
> On Thu, 4 Oct 2018 13:32:26 +0200
> Auger Eric <address@hidden> wrote:
> 
> > Hi Igor,
> > 
> > On 10/4/18 1:11 PM, Igor Mammedov wrote:
> > > On Wed, 3 Oct 2018 15:49:03 +0200
> > > Auger Eric <address@hidden> wrote:
> > >   
> > >> Hi,
> > >>
> > >> On 7/3/18 9:19 AM, Eric Auger wrote:  
> > >>> This series aims at supporting PCDIMM/NVDIMM intantiation in
> > >>> machvirt at 2TB guest physical address.
> > >>>
> > >>> This is achieved in 3 steps:
> > >>> 1) support more than 40b IPA/GPA
> > >>> 2) support PCDIMM instantiation
> > >>> 3) support NVDIMM instantiation    
> > >>
> > >> While respinning this series I have some general questions that raise up
> > >> when thinking about extending the RAM on mach-virt:
> > >>
> > >> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
> > >> ("-m " option).
> > >>
> > >> This series does not touch this initial RAM and only targets to add
> > >> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
> > >> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
> > >> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it 
> > >> OK?
> > >>
> > >> - Putting device memory at 2TB means only ARMv8/aarch64 would get
> > >> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
> > >> ARMv8/aarch32. Do we need to put effort supporting more memory and
> > >> memory devices for those configs? there is less than 256GB free in the
> > >> existing 1TB mach-virt memory map anyway.
> > >>
> > >> - is it OK to rely only on device memory to extend the existing 255 GB
> > >> RAM or would we need additional initial memory? device memory usage
> > >> induces a more complex command line so this puts a constraint on upper
> > >> layers. Is it acceptable though?
> > >>
> > >> - I revisited the series so that the max IPA size shift would get
> > >> automatically computed according to the top address reached by the
> > >> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
> > >> any additional kvm-type or explicit vm-phys-shift option to select the
> > >> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
> > >> also assumes we don't put anything beyond the device memory. It is OK?
> > >>
> > >> - Igor told me we was concerned about the split-memory RAM model as it
> > >> caused a lot of trouble regarding compat/migration on PC machine. After
> > >> having studied the pc machine code I now wonder if we can compare the PC
> > >> compat issues with the ones we could encounter on ARM with the proposed
> > >> split memory model.  
> > > that's not the only issue.
> > > 
> > > For example since initial memory isn't modeled as a device
> > > (i.e. it's just a plain memory region), there is a bunch of numa
> > > code to deal with it. If initial memory were replaced by pc-dimm,
> > > we would drop some of it and if we deprecated old '-numa mem' we
> > > should be able to drop the most of it (newer '-numa memdev' maps
> > > directly into pc-dimm model).  
> > see my comment below.
> > > 
> > >    
> > >> On PC there are many knobs to tune the RAM layout
> > >> - max_ram_below_4g option tunes how much RAM we want below 4G
> > >> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
> > >> max_ram_below_4g
> > >> - plus the usual ram_size which affects the rest of the initial ram
> > >> - plus the maxram_size, slots which affect the size of the device memory
> > >> - the device memory is just behind the initial RAM, aligned to 1GB
> > >>
> > >> Note the inital RAM and the device memory may be disjoint due to
> > >> misalignment of the initial ram size against 1GB
> > >>
> > >> On ARM, we would have 3.0 virt machine supporting only initial RAM from
> > >> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
> > >> initial RAM + device memory from 2TB to 4TB.
> > >>
> > >> With that memory split and the different machine type, I don't see any
> > >> major hurdle with respect to migration. Do I miss something?  
> > > Later on someone with a need to punch holes in fixed initial RAM/device 
> > > memory,
> > > starts making it complex.  
> > Support of host reserved regions is not acked yet but that's a valid
> > argument.
> > >   
> > >> Alternative to have a split model is having a floating RAM base for a
> > >> contiguous initial + device memory (contiguity actually depends on
> > >> initial RAM size alignment too). This requires significant changes in FW
> > >> and also potentially impacts the legacy virt address map as we need to
> > >> pass the RAM floating base address in some way (using an SRAM at 1GB) or
> > >> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
> > >> reluctance to move the RAM earlier  
> > > Drew is working on it, lets see outcome first.
> > > 
> > > We actually may try implement single region that uses pc-dimm for
> > > all memory (including initial) and be still compatible with legacy layout
> > > as far as legacy mode sticks to the current RAM limit and device memory
> > > region is put at the current RAM base.
> > > When flexible RAM base is available, we will move that region to
> > > non legacy layout at 2TB (or wherever).  
> > 
> > Oh I did not understand you wanted to also replace the initial memory by
> > device memory. So we would switch from a pure static initial RAM setup
> > to a pure dynamic device memory setup. Looks quite drastic a change to
> > me. As mentionned I am concerned about complexifying the qemu cmd line
> > and I asked livirt guys about the induced pain.
> Converting initial ram to memory device model beyond the current limits
> within single RAM zone, is the reason why flexible RAM idea was brought in.
> That way we'd end up with a single way to instantiate RAM (model after
> bare-metal machines) and possibility to use hotplug/nvdimm/... with initial
> RAM without any huge refactoring (+compat knobs) on top later.
> 
> 2 regions solution is easier hack together right now. If there are
> more regions and we leave initial RAM as is (there is no point
> to bother with flexible RAM base) but it won't lead us to uniform
> RAM handling and won't simplify anything.
> 
> Considering virt board doesn't have compat RAM layout baggage of x86,
> it only looks drastic, but in reality it might turn out into a simple
> refactoring.
> 
> As for complicated CLI, for compat reasons we will be forced to support
> '-m size=!0', we should be able to translate that implicitly into dimm.
> In addition with dimms as initial memory users would have a choice to ditch
> "-numa (mem|memdev)" altogether and do
>   -m 0,slots=X,maxmem=Y -device pc-dimm,node=x...
> and related '-numa' would become a compat shim to translate into
> the similar dimm devices set under the hood.
> (looks like too much fantasy :))
> 
> Possible complications on QEMU side I see in handling of legacy '-numa mem'.
> Easiest would be deprecate it and then do conversion or workaround
> it by replacing it with pc-dimm like device that's treated like
> a memory region that we have now.

And any migration compatibility issues of the naming of the RAMBlocks;
if virt is at the point it cares about that compatibility.

Dave

> > 
> > Thank you for your feedbacks
> > 
> > Eric
> > 
> > 
> > >   
> > >> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
> > >>
> > >> Your feedbacks on those points are really welcome!
> > >>
> > >> Thanks
> > >>
> > >> Eric
> > >>  
> > >>>
> > >>> This series reuses/rebases patches initially submitted by Shameer in [1]
> > >>> and Kwangwoo in [2].
> > >>>
> > >>> I put all parts all together for consistency and due to dependencies
> > >>> however as soon as the kernel dependency is resolved we can consider
> > >>> upstreaming them separately.
> > >>>
> > >>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
> > >>> -----------------------------------------------
> > >>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > >>>
> > >>> At the moment the guest physical address space is limited to 40b
> > >>> due to KVM limitations. [0] bumps this limitation and allows to
> > >>> create a VM with up to 52b GPA address space.
> > >>>
> > >>> With this series, QEMU creates a virt VM with the max IPA range
> > >>> reported by the host kernel or 40b by default.
> > >>>
> > >>> This choice can be overriden by using the -machine kvm-type=<bits>
> > >>> option with bits within [40, 52]. If <bits> are not supported by
> > >>> the host, the legacy 40b value is used.
> > >>>
> > >>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
> > >>> 40. This will need to be fixed.
> > >>>
> > >>> PCDIMM Support [ patches 6 - 11 ]
> > >>> ---------------------------------
> > >>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > >>>
> > >>> We instantiate the device_memory at 2TB. Using it obviously requires
> > >>> at least 42b of IPA/GPA. While its max capacity is currently limited
> > >>> to 2TB, the actual size depends on the initial guest RAM size and
> > >>> maxmem parameter.
> > >>>
> > >>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> > >>> of support of those features in baremetal.
> > >>>
> > >>> NVDIMM support [ patches 12 - 15 ]
> > >>> ----------------------------------
> > >>>
> > >>> Once the memory hotplug framework is in place it is fairly
> > >>> straightforward to add support for NVDIMM. the machine "nvdimm" option
> > >>> turns the capability on.
> > >>>
> > >>> Best Regards
> > >>>
> > >>> Eric
> > >>>
> > >>> References:
> > >>>
> > >>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> > >>> https://www.spinics.net/lists/kernel/msg2841735.html
> > >>>
> > >>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> > >>> http://patchwork.ozlabs.org/cover/914694/
> > >>>
> > >>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> > >>>
> > >>> Tests:
> > >>> - On Cavium Gigabyte, a 48b VM was created.
> > >>> - Migration tests were performed between kernel supporting the
> > >>>   feature and destination kernel not suporting it
> > >>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
> > >>>   memory map was hacked to move the device memory below 1TB.
> > >>>
> > >>> This series can be found at:
> > >>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> > >>>
> > >>> History:
> > >>>
> > >>> v2 -> v3:
> > >>> - fix pc_q35 and pc_piix compilation error
> > >>> - kwangwoo's email being not valid anymore, remove his address
> > >>>
> > >>> v1 -> v2:
> > >>> - kvm_get_max_vm_phys_shift moved in arch specific file
> > >>> - addition of NVDIMM part
> > >>> - single series
> > >>> - rebase on David's refactoring
> > >>>
> > >>> v1:
> > >>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > >>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > >>>
> > >>> Best Regards
> > >>>
> > >>> Eric
> > >>>
> > >>>
> > >>> Eric Auger (9):
> > >>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> > >>>   hw/boards: Add a MachineState parameter to kvm_type callback
> > >>>   kvm: add kvm_arm_get_max_vm_phys_shift
> > >>>   hw/arm/virt: support kvm_type property
> > >>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
> > >>>   hw/arm/virt: Allocate device_memory
> > >>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
> > >>>   hw/arm/boot: Expose the pmem nodes in the DT
> > >>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> > >>>
> > >>> Kwangwoo Lee (2):
> > >>>   nvdimm: use configurable ACPI IO base and size
> > >>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> > >>>
> > >>> Shameer Kolothum (4):
> > >>>   hw/arm/virt: Add memory hotplug framework
> > >>>   hw/arm/boot: introduce fdt_add_memory_node helper
> > >>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> > >>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> > >>>
> > >>>  accel/kvm/kvm-all.c                            |   2 +-
> > >>>  default-configs/arm-softmmu.mak                |   4 +
> > >>>  hw/acpi/aml-build.c                            |  51 ++++
> > >>>  hw/acpi/nvdimm.c                               |  28 ++-
> > >>>  hw/arm/boot.c                                  | 123 +++++++--
> > >>>  hw/arm/virt-acpi-build.c                       |  10 +
> > >>>  hw/arm/virt.c                                  | 330 
> > >>> ++++++++++++++++++++++---
> > >>>  hw/i386/acpi-build.c                           |  49 ----
> > >>>  hw/i386/pc_piix.c                              |   8 +-
> > >>>  hw/i386/pc_q35.c                               |   8 +-
> > >>>  hw/ppc/mac_newworld.c                          |   2 +-
> > >>>  hw/ppc/mac_oldworld.c                          |   2 +-
> > >>>  hw/ppc/spapr.c                                 |   2 +-
> > >>>  include/hw/acpi/aml-build.h                    |   3 +
> > >>>  include/hw/arm/arm.h                           |   2 +
> > >>>  include/hw/arm/virt.h                          |   7 +
> > >>>  include/hw/boards.h                            |   2 +-
> > >>>  include/hw/mem/nvdimm.h                        |  12 +
> > >>>  include/standard-headers/linux/virtio_config.h |  16 +-
> > >>>  linux-headers/asm-mips/unistd.h                |  18 +-
> > >>>  linux-headers/asm-powerpc/kvm.h                |   1 +
> > >>>  linux-headers/linux/kvm.h                      |  16 ++
> > >>>  target/arm/kvm.c                               |   9 +
> > >>>  target/arm/kvm_arm.h                           |  16 ++
> > >>>  24 files changed, 597 insertions(+), 124 deletions(-)
> > >>>     
> > >>  
> > >   
> > 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]