grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 16/19] ieee1275: support runtime memory claiming


From: Daniel Axtens
Subject: [PATCH 16/19] ieee1275: support runtime memory claiming
Date: Tue, 12 Oct 2021 18:30:05 +1100

On powerpc-ieee1275, we are running out of memory trying to verify
anything. This is because:

 - we have to load an entire file into memory to verify it. This is
   extremely difficult to change with appended signatures.
 - We only have 32MB of heap.
 - Distro kernels are now often around 30MB.

So we want to be able to claim more memory from OpenFirmware for our heap
at runtime.

There are some complications:

 - The grub mm code isn't the only thing that will make claims on
   memory from OpenFirmware:

    * PFW/SLOF will have claimed some for their own use.

    * The ieee1275 loader will try to find other bits of memory that we
      haven't claimed to place the kernel and initrd when we go to boot.

    * Once we load Linux, it will also try to claim memory. It claims
      memory without any reference to /memory/available, it just starts
      at min(top of RMO, 768MB) and works down. So we need to avoid this
      area. See arch/powerpc/kernel/prom_init.c as of v5.11.

 - The smallest amount of memory a ppc64 KVM guest can have is 256MB.
   It doesn't work with distro kernels but can work with custom kernels.
   We should maintain support for that. (ppc32 can boot with even less,
   and we shouldn't break that either.)

 - Even if a VM has more memory, the memory OpenFirmware makes available
   as Real Memory Area can be restricted. Even with our CAS work, an LPAR
   on a PowerVM box is likely to have only 512MB available to OpenFirmware
   even if it has many gigabytes of memory allocated.

The current EFI approach is in flux. Previously, EFI systems would attempt
to allocate 1/4th of the available memory, clamped to between 1M and 1600M.
Now EFI seems to be moving towards a base allocation plus runtime allocations.

What should we do?

We don't know in advance how big the kernel and initrd are going to be,
which makes figuring out how much memory we can take a bit tricky.

To figure out how much memory we should leave unused, I looked at:

 - an Ubuntu 20.04.1 ppc64le pseries KVM guest:
    vmlinux: ~30MB
    initrd:  ~50MB

 - a RHEL8.2 ppc64le pseries KVM guest:
    vmlinux: ~30MB
    initrd:  ~30MB

So to give us a little wriggle room, I think we want to leave at least
128MB for the loader to put vmlinux and initrd in memory and leave Linux
with space to satisfy its early allocations.

Other space can now be allocated at runtime.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 docs/grub-dev.texi             |   7 +-
 grub-core/kern/ieee1275/init.c | 179 +++++++++++++++++++++++++++++----
 2 files changed, 165 insertions(+), 21 deletions(-)

diff --git a/docs/grub-dev.texi b/docs/grub-dev.texi
index fb2cc965ed80..d576d17e3be2 100644
--- a/docs/grub-dev.texi
+++ b/docs/grub-dev.texi
@@ -1047,7 +1047,10 @@ space is limited to 4GiB. GRUB allocates pages from EFI 
for its heap, at most
 1.6 GiB.
 
 On i386-ieee1275 and powerpc-ieee1275 GRUB uses same stack as IEEE1275.
-It allocates at most 32MiB for its heap.
+
+On i386-ieee1275 and powerpc-ieee1275, GRUB will allocate 32MiB for its heap on
+startup. It may allocate more at runtime, as long as at least 128MiB remain 
free
+in OpenFirmware.
 
 On sparc64-ieee1275 stack is 256KiB and heap is 2MiB.
 
@@ -1075,7 +1078,7 @@ In short:
 @item i386-qemu               @tab 60 KiB  @tab < 4 GiB
 @item *-efi                   @tab ?       @tab < 1.6 GiB
 @item i386-ieee1275           @tab ?       @tab < 32 MiB
-@item powerpc-ieee1275        @tab ?       @tab < 32 MiB
+@item powerpc-ieee1275        @tab ?       @tab available memory - 128MiB
 @item sparc64-ieee1275        @tab 256KiB  @tab 2 MiB
 @item arm-uboot               @tab 256KiB  @tab 2 MiB
 @item mips(el)-qemu_mips      @tab 2MiB    @tab 253 MiB
diff --git a/grub-core/kern/ieee1275/init.c b/grub-core/kern/ieee1275/init.c
index bf2cd200893d..0fb7bae280df 100644
--- a/grub-core/kern/ieee1275/init.c
+++ b/grub-core/kern/ieee1275/init.c
@@ -45,13 +45,21 @@
 #include <grub/machine/kernel.h>
 #endif
 
-/* The maximum heap size we're going to claim */
+/* The maximum heap size we're going to claim at boot. Not used by sparc. */
 #ifdef __i386__
 #define HEAP_MAX_SIZE          (unsigned long) (64 * 1024 * 1024)
-#else
+#else // __powerpc__
 #define HEAP_MAX_SIZE          (unsigned long) (32 * 1024 * 1024)
 #endif
 
+/* The amount of OF space we will not claim here so as to leave space for
+   the loader and linux to service early allocations.
+
+   In 2021, Daniel Axtens claims that we should leave at least 128MB to
+   ensure we can load a stock kernel and initrd on a pseries guest with
+   a 512MB real memory area under PowerVM. */
+#define RUNTIME_MIN_SPACE (unsigned long) (128 * 1024 * 1024)
+
 extern char _start[];
 extern char _end[];
 
@@ -145,16 +153,54 @@ grub_claim_heap (void)
                                 + GRUB_KERNEL_MACHINE_STACK_SIZE), 0x200000);
 }
 #else
-/* Helper for grub_claim_heap.  */
+/* Helpers for mm on powerpc. */
+
+/* How much memory does OF believe exists in total?
+
+   This isn't necessarily the true total. It can be the total memory
+   accessible in real mode for a pseries guest, for example.
+ */
+static grub_uint64_t rmo_top;
+
+/* How much have we claimed so far? */
+static grub_uint32_t allocated_memory;
+
 static int
-heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
-          void *data)
+count_free (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
+           void *data)
+{
+  if (type != GRUB_MEMORY_AVAILABLE)
+    return 0;
+
+  /* Do not consider memory beyond 4GB */
+  if (addr > 0xffffffffULL)
+    return 0;
+
+  if (addr + len > 0xffffffffULL)
+    len = 0xffffffffULL - addr;
+
+  *(grub_uint32_t *)data += len;
+
+  return 0;
+}
+
+static int
+regions_claim (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
+             unsigned int flags, void *data)
 {
-  unsigned long *total = data;
+  grub_uint32_t total = *(grub_uint32_t *)data;
+  grub_uint32_t linux_rmo_save;
 
   if (type != GRUB_MEMORY_AVAILABLE)
     return 0;
 
+  /* Do not consider memory beyond 4GB */
+  if (addr > 0xffffffffULL)
+    return 0;
+
+  if (addr + len > 0xffffffffULL)
+    len = 0xffffffffULL - addr;
+
   if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_NO_PRE1_5M_CLAIM))
     {
       if (addr + len <= 0x180000)
@@ -167,10 +213,6 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, 
grub_memory_type_t type,
        }
     }
 
-  /* Never exceed HEAP_MAX_SIZE  */
-  if (*total + len > HEAP_MAX_SIZE)
-    len = HEAP_MAX_SIZE - *total;
-
   /* In theory, firmware should already prevent this from happening by not
      listing our own image in /memory/available.  The check below is intended
      as a safeguard in case that doesn't happen.  However, it doesn't protect
@@ -182,6 +224,47 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, 
grub_memory_type_t type,
       len = 0;
     }
 
+  /* Linux likes to claim memory at min(RMO top, 768MB) and works down
+     without reference to /memory/available. (See prom_init.c::alloc_down)
+
+     If this block contains min(RMO top, 768MB), do not claim below that for
+     at least a few MB (this is where RTAS, SML and potentially TCEs live).
+
+     We also need to leave enough space for the DT in the RMA. (See
+     prom_init.c::alloc_up)
+
+     Finally, we also want to make sure that when grub loads the kernel,
+     it isn't going to use up all the memory we're trying to reserve! So
+     enforce our entire RUNTIME_MIN_SPACE here.
+
+     Of course, only consider this if we have more than RUNTIME_MIN_SPACE
+     to begin with. If we have <= 128MB of ram, we claim only the static
+     blocks summing to HEAP_MAX_SIZE.
+   */
+  if (rmo_top > RUNTIME_MIN_SPACE)
+    {
+      linux_rmo_save = grub_min (0x30000000, rmo_top) - RUNTIME_MIN_SPACE;
+      if ((addr < linux_rmo_save) && ((addr + len) > linux_rmo_save))
+       len = linux_rmo_save - addr;
+      else if (addr == linux_rmo_save)
+       {
+         if (len < RUNTIME_MIN_SPACE)
+           return 0;
+         addr = 0x30000000;
+         len -= RUNTIME_MIN_SPACE;
+        }
+    }
+
+  if (flags & GRUB_MM_ADD_REGION_CONSECUTIVE)
+    {
+      // only continue if we can satisfy the entire allocation
+      if (len < total)
+       return 0;
+    }
+
+  if (len > total)
+    len = total;
+
   if (len)
     {
       grub_err_t err;
@@ -190,15 +273,69 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, 
grub_memory_type_t type,
       if (err)
        return err;
       grub_mm_init_region ((void *) (grub_addr_t) addr, len);
+      total -= len;
+      allocated_memory += len;
     }
 
-  *total += len;
-  if (*total >= HEAP_MAX_SIZE)
+  *(grub_uint32_t *)data = total;
+
+  if (total == 0)
     return 1;
 
   return 0;
 }
 
+static int
+heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
+          void *data)
+{
+  return regions_claim(addr, len, type, GRUB_MM_ADD_REGION_NONE, data);
+}
+
+static int
+region_claim (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
+          void *data)
+{
+  return regions_claim(addr, len, type, GRUB_MM_ADD_REGION_CONSECUTIVE, data);
+}
+
+
+static grub_err_t grub_ieee1275_mm_add_region (grub_size_t size, unsigned int 
flags)
+{
+  grub_uint32_t total = size;
+  grub_uint32_t free_memory;
+
+  if (flags & GRUB_MM_ADD_REGION_CONSECUTIVE)
+    {
+      /* Update free memory each time, which is a bit inefficient but guards us
+        against some driver going out to firmware and fw grabbing memory to 
service
+       the request */
+      grub_machine_mmap_iterate(count_free, &free_memory);
+
+      /* Ensure we leave enough space to boot
+        (be careful of underflow here) */
+      if (free_memory <= RUNTIME_MIN_SPACE)
+       return GRUB_ERR_OUT_OF_MEMORY;
+
+      if (size > free_memory - RUNTIME_MIN_SPACE)
+       return GRUB_ERR_OUT_OF_MEMORY;
+      else
+       {
+         total = grub_max(ALIGN_UP(size, 1024 * 1024) + 1024 * 1024, 32 * 1024 
* 1024);
+         total = grub_min(free_memory - RUNTIME_MIN_SPACE, total);
+       }
+
+      grub_machine_mmap_iterate (region_claim, &total);
+    }
+  else
+    grub_machine_mmap_iterate (heap_init, &total);
+
+  if (total == 0)
+    return GRUB_ERR_NONE;
+  else
+    return GRUB_ERR_OUT_OF_MEMORY;
+}
+
 /* How much memory does OF believe it has? (regardless of whether
    it's accessible or not) */
 static grub_err_t
@@ -331,20 +468,24 @@ grub_ieee1275_ibm_cas (void)
 static void 
 grub_claim_heap (void)
 {
-  unsigned long total = 0;
+  grub_err_t err;
+
+  err = grub_ieee1275_total_mem (&rmo_top);
+
+  /* If we cannot size the available memory, we can't be sure we're leaving
+     space for the kernel, initrd and things Linux loads early in boot. So
+     only allow further allocations from firmware on success */
+  if (err == GRUB_ERR_NONE)
+    grub_mm_add_region_fn = grub_ieee1275_mm_add_region;
 
   if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_CAN_TRY_CAS_FOR_MORE_MEMORY))
     {
-      grub_uint64_t rma_size;
-      grub_err_t err;
-
-      err = grub_ieee1275_total_mem (&rma_size);
       /* if we have an error, don't call CAS, just hope for the best */
-      if (!err && rma_size < (512 * 1024 * 1024))
+      if (!err && rmo_top < (512 * 1024 * 1024))
        grub_ieee1275_ibm_cas();
     }
 
-  grub_machine_mmap_iterate (heap_init, &total);
+  grub_ieee1275_mm_add_region (HEAP_MAX_SIZE, GRUB_MM_ADD_REGION_NONE);
 }
 #endif
 
-- 
2.30.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]