qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via


From: Nicholas Piggin
Subject: Re: [BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()”
Date: Tue, 18 Mar 2025 13:20:56 +1000

Thanks for the report.

Tricky problem. A secondary CPU is hanging before it is started by the
primary via rtas call.

That secondary keeps calling kvm_cpu_exec(), which keeps exiting out
early with EXCP_HLT because kvm_arch_process_async_events() returns
true because that cpu has ->halted=1. That just goes around he run
loop because there is an interrupt pending (DEC).

So it never runs. It also never releases the BQL, and another CPU,
the primary which is actually supposed to be running, is stuck in
spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL.

This patch just exposes the bug I think, by causing the interrupt.
although I'm not quite sure why it's okay previously (-ve decrementer
values should be causing a timer exception too). The timer exception
should not be taken as an interrupt by those secondary CPUs, and it
doesn't because it is masked, until set_all_lpcrs sets an LPCR value
that enables powersave wakeup on decrementer interrupt.

The start_powered_off sate just sets ->halted, which makes it look
like a powersaving state. Logically I think it's not the same thing
as far as spapr goes. I don't know why start_powered_off only sets
->halted, and not ->stop/stopped as well.

Not sure how best to solve it cleanly. I'll send a revert if I can't
get something working soon.

Thanks,
Nick

On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote:
> Bug Description:
> Encountering a boot failure when launching a KVM guest with 
> qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor 
> crashes.
>
>
> Reproduction Steps:
> # qemu-system-ppc64 --version
> QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
> Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
>
> # /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
> pseries,accel=kvm \
>    -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
>    -device virtio-scsi-pci,id=scsi \
>    -drive 
> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
>  
> \
>    -device scsi-hd,drive=drive0,bus=scsi.0 \
>    -netdev bridge,id=net0,br=virbr0 \
>    -device virtio-net-pci,netdev=net0 \
>    -serial pty \
>    -device virtio-balloon-pci \
>    -cpu host
> QEMU 9.2.50 monitor - type 'help' for more information
> char device redirected to /dev/pts/2 (label serial0)
> (qemu)
> (qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but 
> unavailable: IRQ_XIVE capability must be present for KVM
> Falling back to kernel-irqchip=off
> ** Qemu Hang
>
> (In another ssh session)
> # screen /dev/pts/2
> Preparing to boot Linux version 6.10.4-200.fc40.ppc64le 
> (mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 
> (Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 
> 15:20:17 UTC 2024
> Detected machine type: 0000000000000101
> command line: 
> BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le 
> root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
> Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
> Calling ibm,client-architecture-support... done
> memory layout at init:
>    memory_limit : 0000000000000000 (16 MB aligned)
>    alloc_bottom : 0000000008200000
>    alloc_top    : 0000000030000000
>    alloc_top_hi : 0000000800000000
>    rmo_top      : 0000000030000000
>    ram_top      : 0000000800000000
> instantiating rtas at 0x000000002fff0000... done
> prom_hold_cpus: skipped
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
> Device tree struct  0x0000000008220000 -> 0x0000000008230000
> Quiescing Open Firmware ...
> Booting Linux via __start() @ 0x0000000000440000 ...
> ** Guest Console Hang
>
>
> Git Bisect:
> Performing git bisect points to the following patch:
> # git bisect bad
> e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
> commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Thu Dec 19 13:40:31 2024 +1000
>
>      target/ppc: fix timebase register reset state
>
>      (H)DEC and PURR get reset before icount does, which causes them to 
> be
>      skewed and not match the init state. This can cause replay to not
>      match the recorded trace exactly. For DEC and HDEC this is usually 
> not
>      noticable since they tend to get programmed before affecting the
>      target machine. PURR has been observed to cause replay bugs when
>      running Linux.
>
>      Fix this by resetting using a time of 0.
>
>      Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
>      Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
>   hw/ppc/ppc.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
>
>
> Reverting the patch helps boot the guest.
> Thanks,
> Misbah Anjum N




reply via email to

[Prev in Thread] Current Thread [Next in Thread]