qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH qemu] spapr: Use address from elf parser for kernel address


From: Alexey Kardashevskiy
Subject: Re: [PATCH qemu] spapr: Use address from elf parser for kernel address
Date: Fri, 6 May 2022 14:49:01 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Thunderbird/100.0



On 06/05/2022 01:50, Fabiano Rosas wrote:
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

On 5/5/22 05:16, Fabiano Rosas wrote:
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.

QEMU loads the kernel at 0x400000 by default which works most of
the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
(position independent code). This works for a little endian zImage too.

However a big endian zImage is compiled without -pie, is 32bit, linked to
0x4000000 so current QEMU ends up loading it at
0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.

This uses the kernel address returned from load_elf().
If the default kernel_addr is used, there is no change in behavior (as
translate_kernel_address() takes care of this), which is:
LE/BE vmlinux and LE zImage boot, BE zImage does not.
If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
prints a warning and BE zImage boots.

I think we can fix this without needing a different command line for BE
zImage (apart from x-vof, which is a separate matter).

If you look at translate_kernel_address, it cannot really work when the
ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
so if we fix that function like this...

static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
{
      SpaprMachineState *spapr = opaque;

      return addr ? addr : spapr->kernel_addr;
}


The qemu elf loader is supposed to handle relocations which should be
calling this hook more than once, now I wonder why it is not doing so.

For relocations, it seems to only call translate_fn for s390x.


Agrh. You made me read this :) It is quite weird code and yeah 390-only :-/


...then we could always use the ELF PhysAddr if it is different from 0
and only use the default load addr if the ELF PhysAddr is 0. If the user
gives kernel_addr on the cmdline, we honor that, even if puts the kernel
over the firmware (we have code to detect that).


ELF address is 0 for LE zImage only, vmlinux BE/LE uses
0xc000000000000000. And we are already chopping all these tops bits off
in translate_kernel_address() and I do not really know why _exactly_ it
is 0x0fffffff and not let's say 0x7fffffff.

Oh, I am not talking about the ELF entry point. And that is not what
load_elf passes to translate_kernel_address either. It is the ELF
PhysAddr:

$ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail

Program Headers:
   Type           Offset   VirtAddr           PhysAddr           FileSiz  
MemSiz   Flg Align
   LOAD           0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 
0x2a54ea8 RWE 0x10000

So it is 0x400000 for BE zImage and 0 for vmlinux.

Ah right. Me wrong.

btw potentially there can be more program sections.

[fstn1-p1 ~]$ readelf -l /home/aik/p/slof/board-qemu/llfw/stage1.elf

Elf file type is EXEC (Executable file)
Entry point 0x100
There are 2 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000100 0x0000000000000100 0x0000000000000100
                 0x0000000000008110 0x0000000000010290  RWE    0x4000
  LOAD           0x0000000000008210 0x0000000000010400 0x0000000000010400
                 0x0000000000000010 0x0000000000000010  RW     0x8


grub might be similar.




@@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
               exit(1);
           }
+ if (spapr->kernel_addr != loaded_addr) {

This could be:

          if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
            spapr->kernel_addr != loaded_addr) {

So the precedence would be:

1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
     falls here;
2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
     here;

3- kernel_addr. The user is probably hacking something, just use what
     they gave us. QEMU will yell if they load the kernel over the fw.


imho too complicated.

What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is
KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?

Good point. It should always be 3. It is a bad user interface to have a
command line option and arbitrarily ignore it. Be it 0x0 or 0x400000.

I am basically fixing a bug when we ignore where load_elf() loaded the
ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the
ELF was loaded where we want it to be.

That bug is already accounted for, that is why we have
translate_kernel_address:

   /* address_offset is hack for kernel images that are
      linked at the wrong physical address.  */
   if (translate_fn) {
       addr = translate_fn(translate_opaque, ph->p_paddr);

So we're actively telling load_elf to load the kernel at the wrong place
when we do:

(ph->p_addr & 0x0fffffff) + spapr->kernel_addr

It just happened to work so far because the vmlinux PhysAddr is 0.

When you set kernel-addr=0 you are simply working around this other bug
because:

(ph->p_addr & 0x0fffffff) + 0 == ph->p_addr

I just want to remove this indirection and use the ELF PhysAddr
directly.


That's alright but not in translate_kernel_address(). May be I should rename kernel-addr to kernel-offset (which it really is) or hack load_elf() so it would take the desired location and work out the offset inside (and ditch the translate_fn hook) but either way we'll need heuristics (or the user should know) to know if the image is self-relocatable or not as otherwise it just won't boot.



Everything else can be done but on top of this.

If you want to merge this I could send another patch on top of it later,
no worries. I realise that this a larger change that will need more
testing.


I want to have some easy-to-explain way of booting BE zImage :)



+            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
+                        spapr->kernel_addr, loaded_addr);
+            spapr->kernel_addr = loaded_addr;
+        }
+
           /* load initrd */
           if (initrd_filename) {
               /* Try to locate the initrd in the gap between the kernel

--
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]