qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH qemu v21] spapr: Implement Open Firmware client interface


From: David Gibson
Subject: Re: [PATCH qemu v21] spapr: Implement Open Firmware client interface
Date: Sat, 19 Jun 2021 19:28:57 +1000

On Thu, Jun 17, 2021 at 11:16:19AM +0200, BALATON Zoltan wrote:
> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
> > On 16/06/2021 20:34, BALATON Zoltan wrote:
> > > On Wed, 16 Jun 2021, Alexey Kardashevskiy wrote:
> > > > On 6/15/21 20:29, BALATON Zoltan wrote:
> > > > > On Tue, 15 Jun 2021, Alexey Kardashevskiy wrote:
> > > > > > The PAPR platform describes an OS environment that's presented by
> > > > > > a combination of a hypervisor and firmware. The features it 
> > > > > > specifies
> > > > > > require collaboration between the firmware and the hypervisor.
> > > > > > 
> > > > > > Since the beginning, the runtime component of the firmware (RTAS) 
> > > > > > has
> > > > > > been implemented as a 20 byte shim which simply forwards it to
> > > > > > a hypercall implemented in qemu. The boot time firmware component is
> > > > > > SLOF - but a build that's specific to qemu, and has always needed 
> > > > > > to be
> > > > > > updated in sync with it. Even though we've managed to limit the 
> > > > > > amount
> > > > > > of runtime communication we need between qemu and SLOF, there's 
> > > > > > some,
> > > > > > and it has become increasingly awkward to handle as we've 
> > > > > > implemented
> > > > > > new features.
> > > > > > 
> > > > > > This implements a boot time OF client interface (CI) which is
> > > > > > enabled by a new "x-vof" pseries machine option (stands
> > > > > > for "Virtual Open
> > > > > > Firmware). When enabled, QEMU implements the custom H_OF_CLIENT 
> > > > > > hcall
> > > > > > which implements Open Firmware Client Interface (OF CI). This allows
> > > > > > using a smaller stateless firmware which does not have to manage
> > > > > > the device tree.
> > > > > > 
> > > > > > The new "vof.bin" firmware image is included with source code under
> > > > > > pc-bios/. It also includes RTAS blob.
> > > > > > 
> > > > > > This implements a handful of CI methods just to get -kernel/-initrd
> > > > > > working. In particular, this implements the device tree fetching and
> > > > > > simple memory allocator - "claim" (an OF CI memory
> > > > > > allocator) and updates
> > > > > > "/memory@0/available" to report the client about available memory.
> > > > > > 
> > > > > > This implements changing some device tree properties which we know 
> > > > > > how
> > > > > > to deal with, the rest is ignored. To allow changes, this skips
> > > > > > fdt_pack() when x-vof=on as not packing the blob leaves some room 
> > > > > > for
> > > > > > appending.
> > > > > > 
> > > > > > In absence of SLOF, this assigns phandles to device tree nodes to 
> > > > > > make
> > > > > > device tree traversing work.
> > > > > > 
> > > > > > When x-vof=on, this adds "/chosen" every time QEMU (re)builds a 
> > > > > > tree.
> > > > > > 
> > > > > > This adds basic instances support which are managed by a hash map
> > > > > > ihandle -> [phandle].
> > > > > > 
> > > > > > Before the guest started, the used memory is:
> > > > > > 0..e60 - the initial firmware
> > > > > > 8000..10000 - stack
> > > > > > 400000.. - kernel
> > > > > > 3ea0000.. - initramdisk
> > > > > > 
> > > > > > This OF CI does not implement "interpret".
> > > > > > 
> > > > > > Unlike SLOF, this does not format uninitialized nvram. Instead, this
> > > > > > includes a disk image with pre-formatted nvram.
> > > > > > 
> > > > > > With this basic support, this can only boot into kernel directly.
> > > > > > However this is just enough for the petitboot kernel and 
> > > > > > initradmdisk to
> > > > > > boot from any possible source. Note this requires
> > > > > > reasonably recent guest
> > > > > > kernel with:
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
> > > > > > The immediate benefit is much faster booting time which
> > > > > > especially
> > > > > > crucial with fully emulated early CPU bring up environments. Also 
> > > > > > this
> > > > > > may come handy when/if GRUB-in-the-userspace sees light of the day.
> > > > > > 
> > > > > > This separates VOF and sPAPR in a hope that VOF bits may be reused 
> > > > > > by
> > > > > > other POWERPC boards which do not support pSeries.
> > > > > > 
> > > > > > This make VOF optional, it is disabled by default, add --enable-vof
> > > > > > to ./configure to enable it.
> > > > > > 
> > > > > > This assumes potential support for booting from QEMU backends
> > > > > > such as blockdev or netdev without devices/drivers used.
> > > > > > 
> > > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > > > > ---
> > > > > > 
> > > > > > The example command line is:
> > > > > > 
> > > > > > /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
> > > > > > -nodefaults \
> > > > > > -chardev stdio,id=STDIO0,signal=off,mux=on \
> > > > > > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> > > > > > -mon id=MON0,chardev=STDIO0,mode=readline \
> > > > > > -nographic \
> > > > > > -vga none \
> > > > > > -enable-kvm \
> > > > > > -m 8G \
> > > > > > -machine 
> > > > > > pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
> > > > > > \
> > > > > > -kernel pbuild/kernel-le-guest/vmlinux \
> > > > > > -initrd pb/rootfs.cpio.xz \
> > > > > > -drive 
> > > > > > id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw
> > > > > > \
> > > > > > -global spapr-nvram.drive=DRIVE0 \
> > > > > > -snapshot \
> > > > > > -smp 8,threads=8 \
> > > > > > -L /home/aik/t/qemu-ppc64-bios/ \
> > > > > > -trace events=qemu_trace_events \
> > > > > > -d guest_errors \
> > > > > > -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
> > > > > > -mon chardev=SOCKET0,mode=control
> > > > > 
> > > > > I haven't looked at it in detail yet, just some quick
> > > > > comments I have on first skim through.
> > > > > 
> > > > > > ---
> > > > > > Changes:
> > > > > > v21:
> > > > > > * s/ld/ldz/ in entry.S
> > > > > 
> > > > > Typo? Has this become lwz?
> > > > 
> > > > Yup, lwz.
> > > > 
> > > > > 
> > > > > > * moved CONFIG_VOF from
> > > > > > default-configs/devices/ppc64-softmmu.mak to Kconfig
> > > > > > * made CONFIG_VOF optional
> > > > > 
> > > > > This won't work for pegasos2, see below.
> > > > > 
> > > > > > * s/l.lds/vof.lds/
> > > > > > * force 32 BE in spapr_machine_reset() instead of the firmware
> > > > > > * added checks for non-null methods of VofMachineIfClass
> > > > > > * moved OF_STACK_SIZE to vof.h, renamed to VOF_...,
> > > > > > added a better comment
> > > > > > * added  path_offset wrapper for handling mixed case for addresses
> > > > > > after "@" in node names
> > > > > > * changed getprop() to check for actual "name" property in the fdt
> > > > > > * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing
> > > > > > as (unlike similar
> > > > > > rtas_ld/ldl_be_*) they return error codes
> > > > > > * VOF_MEM_READ uses now address_space_read (it was
> > > > > > address_space_read_full
> > > > > > before, not sure why)
> > > > > [...]
> > > > > > ---
> > > > > > configure               |    9 +
> > > > > > pc-bios/vof/Makefile    |   23 +
> > > > > > include/hw/ppc/spapr.h  |   25 +-
> > > > > > include/hw/ppc/vof.h    |   55 ++
> > > > > > pc-bios/vof/vof.h       |   43 ++
> > > > > > hw/ppc/spapr.c          |   87 +++-
> > > > > > hw/ppc/spapr_hcall.c    |   29 +-
> > > > > > hw/ppc/spapr_vof.c      |  153 ++++++
> > > > > > hw/ppc/vof.c            | 1052 
> > > > > > +++++++++++++++++++++++++++++++++++++++
> > > > > > pc-bios/vof/bootmem.c   |   14 +
> > > > > > pc-bios/vof/ci.c        |   91 ++++
> > > > > > pc-bios/vof/libc.c      |   92 ++++
> > > > > > pc-bios/vof/main.c      |   21 +
> > > > > > tests/qtest/rtas-test.c |   17 +-
> > > > > > MAINTAINERS             |   12 +
> > > > > > hw/ppc/Kconfig          |    3 +
> > > > > > hw/ppc/meson.build      |    3 +
> > > > > > hw/ppc/trace-events     |   24 +
> > > > > > meson.build             |    1 +
> > > > > > pc-bios/README          |    2 +
> > > > > > pc-bios/vof-nvram.bin   |  Bin 0 -> 16384 bytes
> > > > > > pc-bios/vof.bin         |  Bin 0 -> 3784 bytes
> > > > > > pc-bios/vof/entry.S     |   49 ++
> > > > > > pc-bios/vof/vof.lds     |   48 ++
> > > > > > 24 files changed, 1840 insertions(+), 13 deletions(-)
> > > > > > create mode 100644 pc-bios/vof/Makefile
> > > > > > create mode 100644 include/hw/ppc/vof.h
> > > > > > create mode 100644 pc-bios/vof/vof.h
> > > > > > create mode 100644 hw/ppc/spapr_vof.c
> > > > > > create mode 100644 hw/ppc/vof.c
> > > > > > create mode 100644 pc-bios/vof/bootmem.c
> > > > > > create mode 100644 pc-bios/vof/ci.c
> > > > > > create mode 100644 pc-bios/vof/libc.c
> > > > > > create mode 100644 pc-bios/vof/main.c
> > > > > > create mode 100644 pc-bios/vof-nvram.bin
> > > > > > create mode 100755 pc-bios/vof.bin
> > > > > > create mode 100644 pc-bios/vof/entry.S
> > > > > > create mode 100644 pc-bios/vof/vof.lds
> > > > > > 
> [...]
> > > > > > diff --git a/include/hw/ppc/vof.h b/include/hw/ppc/vof.h
> > > > > > new file mode 100644
> > > > > > index 000000000000..65ca2fed0d41
> > > > > > --- /dev/null
> > > > > > +++ b/include/hw/ppc/vof.h
> > > > > > @@ -0,0 +1,55 @@
> > > > > > +/*
> > > > > > + * Virtual Open Firmware
> > > > > > + *
> > > > > > + * SPDX-License-Identifier: GPL-2.0-or-later
> > > > > > + */
> > > > > > +#ifndef HW_VOF_H
> > > > > > +#define HW_VOF_H
> > > > > > +
> > > > > > +typedef struct Vof {
> > > > > > +    uint64_t top_addr; /* copied from rma_size */
> > > > > > +    GArray *claimed; /* array of SpaprOfClaimed */
> > > > > > +    uint64_t claimed_base;
> > > > > > +    GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
> > > > > > +    uint32_t of_instance_last;
> > > > > > +    char *bootargs;
> > > > > > +    long fw_size;
> > > > > > +} Vof;
> > > > > > +
> > > > > > +int vof_client_call(MachineState *ms, Vof *vof, void *fdt,
> > > > > > +                    target_ulong args_real);
> > > > > > +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t
> > > > > > size, uint64_t align);
> > > > > > +void vof_init(Vof *vof, uint64_t top_addr, Error **errp);
> > > > > > +void vof_cleanup(Vof *vof);
> > > > > > +void vof_build_dt(void *fdt, Vof *vof);
> > > > > > +uint32_t vof_client_open_store(void *fdt, Vof *vof,
> > > > > > const char *nodename,
> > > > > > +                               const char *prop, const char *path);
> > > > > > +
> > > > > > +#define TYPE_VOF_MACHINE_IF "vof-machine-if"
> > > > > > +
> > > > > > +typedef struct VofMachineIfClass VofMachineIfClass;
> > > > > > +DECLARE_CLASS_CHECKERS(VofMachineIfClass, VOF_MACHINE,
> > > > > > TYPE_VOF_MACHINE_IF)
> > > > > > +
> > > > > > +struct VofMachineIfClass {
> > > > > > +    InterfaceClass parent;
> > > > > > +    target_ulong
> > > > > > (*client_architecture_support)(MachineState *ms,
> > > > > > CPUState *cs,
> > > > > > +                                                target_ulong vec);
> > > > > > +    void (*quiesce)(MachineState *ms);
> > > > > > +    bool (*setprop)(MachineState *ms, const char *path,
> > > > > > const char *propname,
> > > > > > +                    void *val, int vallen);
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * Initial stack size is from
> > > > > > + * 
> > > > > > https://www.devicetree.org/open-firmware/bindings/ppc/release/ppc-2_1.html
> > > > > > + */
> > > > > > +#define VOF_STACK_SIZE       0x8000
> > > > > 
> > > > > Maybe also add a define for RTAS_SIZE here? We'll need to
> > > > > put that in the device tree but it depends on the rtas shim
> > > > > size that's part of VOF so it should be defined here instead
> > > > > of hardcoding it in boards that use VOF so it can be updated
> > > > > later at one place if needed.
> > > > 
> > > > This is rtas-size for pseries:
> > > > 
> > > > _FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX +
> > > >          ms->smp.max_cpus * sizeof(uint64_t)*2 + sizeof(uint64_t)));
> > > > 
> > > > => depends on cpus => depends on the command line.
> > > > 
> > > > 
> > > > RTAS_SIZE is not used by anything in pseries anymore, I'll send
> > > > a patch to ditch it.
> > > 
> > > I mean you need to have at least the size of code in
> > > pc-bios/vof/entry.S hv_rtas where also hv_rtas_size is defined but
> > > that value is not available in QEMU where one needs to add it to the
> > > device tree. So a define for that should be here in vof.h. Currently
> > > I've counted instructions and have
> > > 
> > >      qemu_fdt_setprop_cell(fdt, "/rtas", "rtas-size", 20);
> > > 
> > > in pegasos2.c but that 20 should be some VOF_RTAS_SIZE instead that
> > > you define corresponding to hv_rtas_size. You'll probably need the
> > > same even after changing above rtas size calculation in spapr
> > > because client has to allocate memory for instantiate-rtas.
> > 
> > 
> > Ah fair point. I do not like "20" here and I think the right thing will
> > be adding whatever number of bytes to rtas-size in the firmware itself
> > and update it in QEMU via "setprop" as we do for "linux,rtas-base". And
> > then do the same in SLOF.
> 
> This is not the base address but the size of the shim with the hypercall
> that instantiate-rtas copies. Why does it need to be updated? And why does
> it need to be more bytes than necessary? I don't know what you do for spapr
> and why do you need larger rtas-size than this but for pegasos2 this
> /rtas/rtas-size property is only used by guests to allocate memory for rtas
> so all I need is how many bytes are needed for hv_rtas in
> pc-bios/vof/entry.S which is what should be #defined in vof.h. I've found 20
> is just enough so you could add that to vof.h.

Because spapr has fwnmi, a firmware assisted dump mechanism which logs
things into part of the RTAS space.  It's a stupid interface, but
that's how PAPR specifies it.  You neither need nor want that for
Pegasos, so you can just have the rtas-size be the 20 bytes for the
tiny hypercall shim.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]