Re: mac99 SMP

qemu-ppc
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mac99 SMP

From:	Andrew Randrianasulu
Subject:	Re: mac99 SMP
Date:	Wed, 5 Mar 2025 05:46:19 +0300
On Wed, Mar 5, 2025 at 5:13 AM Andrew Randrianasulu
<randrianasulu@gmail.com> wrote:
>
>
>
> ср, 5 мар. 2025 г., 04:10 BALATON Zoltan <balaton@eik.bme.hu>:
>>
>> On Wed, 5 Mar 2025, Andrew Randrianasulu wrote:
>> >> I tried  "set trace-commands on" before setting logging on.
>> >>
>> >> It looks better now?
>>
>> Yes, this is more readable but still don't know where it is stuck when you
>> just run with continue as this shows when you're stepping through that
>> works. We would need a backtrace when you start it with continue then when
>> it's waiting for the second CPU after it started it but before it reports
>> it's stuck press ctrl-c then "thread apply all backtrace full" to get
>> where the CPUs are at that point.

+target remote 10.0.2.2:1234
Remote debugging using 10.0.2.2:1234
0xfff00100 in ?? ()
+c
Continuing.

Thread 1 hit Breakpoint 1.1, 0xc0067d4c in smp_core99_kick_cpu () at
./arch/powerpc/include/asm/io.h:167
167    DEF_MMIO_OUT_D(out_8,   8, stb);
+c
Continuing.

Thread 1 hit Breakpoint 1.2, smp_core99_kick_cpu (nr=1) at
arch/powerpc/platforms/powermac/smp.c:802
802        if (nr < 0 || nr > 3)
+c
Continuing.

Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 1.2]
0xc000439c in InstructionAccess_virt () at
arch/powerpc/kernel/head_book3s_32.S:308
308        b    interrupt_return
+thread apply all backtrace full

Thread 2 (Thread 1.2 (CPU#1 [running])):
+backtrace full
#0  0xc000439c in InstructionAccess_virt () at
arch/powerpc/kernel/head_book3s_32.S:308
No locals.
#1  0x00000000 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0xefff4124

Thread 1 (Thread 1.1 (CPU#0 [running])):
+backtrace full
#0  0xc0023044 in __cpu_up (cpu=1, tidle=<optimized out>) at
arch/powerpc/kernel/smp.c:1329
        boot_spin_ms = 5000
        booting = <optimized out>
        hp_spin_ms = 1
        deadline = <optimized out>
        rc = 0
        spin_wait_ms = 5000
        __dummy = <optimized out>
        __dummy2 = <optimized out>
        __dummy = <optimized out>
        __dummy2 = <optimized out>
#1  0xc008a454 in bringup_cpu (cpu=1) at kernel/cpu.c:877
        st = 0xeeddcb24
        idle = 0xc19956a0
        ret = <optimized out>
        __vpp_verify = <optimized out>
#2  0xc008ad6c in cpuhp_invoke_callback (cpu=cpu@entry=1,
state=CPUHP_BRINGUP_CPU, bringup=bringup@entry=true,
node=node@entry=0x0, lastp=lastp@entry=0x0) at kernel/cpu.c:194
        st = 0xeeddcb24
        step = 0xc1591990 <cpuhp_hp_states+1740>
        cbm = <optimized out>
        cb = 0xc008a3f4 <bringup_cpu>
        ret = <optimized out>
        cnt = <optimized out>
        __vpp_verify = <optimized out>
#3  0xc008bde4 in __cpuhp_invoke_callback_range
(bringup=bringup@entry=true, cpu=cpu@entry=1, st=st@entry=0xeeddcb24,
target=target@entry=CPUHP_BRINGUP_CPU, nofail=nofail@entry=false) at
kernel/cpu.c:965
        err = <optimized out>
        state = <optimized out>
        ret = <optimized out>
#4  0xc008c11c in cpuhp_invoke_callback_range (bringup=true, cpu=1,
st=0xeeddcb24, target=CPUHP_BRINGUP_CPU) at kernel/cpu.c:989
No locals.
#5  cpuhp_up_callbacks (target=CPUHP_BRINGUP_CPU, cpu=1,
st=0xeeddcb24) at kernel/cpu.c:1020
        prev_state = <optimized out>
        ret = 0
        prev_state = <optimized out>
        ret = <optimized out>
        __func__ = "cpuhp_up_callbacks"
        __UNIQUE_ID_ddebug747 = {modname = 0xc0fb2a54 "cpu", function
= 0xc0dd4bec <__func__.0> "cpuhp_up_callbacks", filename = 0xc0fae87c
"kernel/cpu.c", format = 0xc0fae96c "CPU UP failed (%d) CPU %u state
%s (%d)\n", lineno = 1022, class_id = 63, flags = 0, key =
{dd_key_true = {key = {enabled = {counter = 0}, {type = 3238560028,
entries = 0xc108811c, next = 0xc108811c}}}, dd_key_false = {key =
{enabled = {counter = 0}, {type = 3238560028, entries = 0xc108811c,
next = 0xc108811c}}}}}
        branch = <optimized out>
        __ret_warn_on = <optimized out>
#6  _cpu_up (tasks_frozen=0, target=CPUHP_BRINGUP_CPU, cpu=1) at
kernel/cpu.c:1690
        st = 0xeeddcb24
        idle = <optimized out>
        ret = <optimized out>
        st = <optimized out>
        idle = <optimized out>
        ret = <optimized out>
        out = <optimized out>
        __vpp_verify = <optimized out>
        __ptr = <optimized out>
        __UNIQUE_ID_x_756 = <optimized out>
        __UNIQUE_ID_y_757 = <optimized out>
#7  cpu_up (cpu=cpu@entry=1, target=CPUHP_ONLINE) at kernel/cpu.c:1722
        err = 0
#8  0xc14188a8 in cpuhp_bringup_mask (mask=<optimized out>,
target=<optimized out>, ncpus=3) at kernel/cpu.c:1788
        st = <optimized out>
        __vpp_verify = <optimized out>
        cpu = 1
#9  bringup_nonboot_cpus (max_cpus=<optimized out>) at kernel/cpu.c:1896
No locals.
#10 0xc1420480 in smp_init () at kernel/smp.c:1009
        num_nodes = 1
        num_cpus = <optimized out>
#11 0xc1403ee8 in kernel_init_freeable () at init/main.c:1572
No locals.
#12 0xc0008c74 in kernel_init (unused=<optimized out>) at init/main.c:1469
        ret = <optimized out>
#13 0xc00212ec in ret_from_kernel_user_thread () at
arch/powerpc/kernel/entry_32.S:193
No locals.

=====

Ugh ... "InstructionAccess_virt () at arch/powerpc/kernel/head_book3s_32.S:308"

https://elixir.bootlin.com/linux/v6.12.17/source/arch/powerpc/kernel/head_book3s_32.S#L308

Data Access Expection? But why .....


>>
>> > Interesting, I hit "s" few more times (for second cpu thread) and it
>> > eventually hit this code:
>> >
>> > No locals.
>> > #3  0xc0023180 in start_secondary (unused=<optimized out>) at
>> > arch/powerpc/kernel/smp.c:1639
>> >        cpu = 1
>> > #4  0x00003338 in ?? ()
>> > No symbol table info available.
>> > +s
>> > _set_L2CR () at arch/powerpc/kernel/l2cr_6xx.S:91
>> > 91              li      r3,-1
>> > +s
>> > 92              blr
>> > +s
>> > 95              mflr    r9
>> > +s
>> > 100             sync
>> > +s
>> > 104             mfmsr   r7              /* Save MSR in r7 */
>> > +s
>> > 105             rlwinm  r4,r7,0,17,15
>> > +s
>> > 106             rlwinm  r4,r4,0,28,26   /* Turn off DR bit */
>> > +s
>> > 107             sync
>> > +s
>> > 108             mtmsr   r4
>> > +s
>> > 109             isync
>> > +s
>> > Cannot access memory at address 0xc001f228
>> [...]
>> > ======
>> >
>> > I hit "c" because I had no idea  for how long it will be stuck like this 
>> > ...
>> >
>> > But eventually kernel come up with no timestamp delay and both cpus ...
>> >
>> > Does this mean isync actually at fault?
>>
>> No, it says in the comment that it turns off DR bit in MSR which is
>> disabling MMU for data access. Once it's disabled you cannot access memory
>> but this code doesn't access memory, it tries to flush cashes and only
>> pokes CPU registers during that. The cache isn't emulated in QEMU so this
>> does nothing but it could be it's expecting some bits in non-emulated
>> registers to change which does not change on QEMU so maybe it's stuck
>> there waiting for that. But that's just a guess. If we don't know where
>> exactly it's stopped waiting you could only go through this code and try
>> to find if those bits it waits for are correctly emulated. Or to verify it
>> stops in this code at all you could comment out where it's called from as
>> this would do nothing on QEMU anyway (you can only comment out the set_LC*
>> functions, which clear caches, other functions would break it but as
>> there's no cache on QEMU this probably doesn't break it just skip these).
>> If that fixes the issue then we at least know which function it is and
>> then try to find where in that function it's stopped. What I still don't
>> understand is why it works when stepping through it? It should behave the
>> same as when not stepping.
>
>
> https://qemu-project.gitlab.io/qemu/system/gdb.html
>
> ====
>
> Advanced debugging options
>
> Changing single-stepping behaviour
>
> The default single stepping behavior is step with the IRQs and timer service 
> routines off. It is set this way because when gdb executes a single step it 
> expects to advance beyond the current instruction. With the IRQs and timer 
> service routines on, a single step might jump into the one of the interrupt 
> or exception vectors instead of executing the current instruction. This means 
> you may hit the same breakpoint a number of times before executing the 
> instruction gdb wants to have executed. Because there are rare circumstances 
> where you want to single step into an interrupt vector the behavior can be 
> controlled from GDB.
>
> ======
>
> may be due to this?
>
> I'll try "thread apply all backtrace full" in next gdb session,  tomorrow.
>
> Thanks for assisting!
>
>
>
>>
>> Regards,
>> BALATON Zoltan
>>
[Prev in Thread]
Current Thread
[Next in Thread]
Re: mac99 SMP, (continued)
Prev by Date: Re: mac99 SMP
Next by Date: Re: mac99 SMP
Previous by thread: Re: mac99 SMP
Next by thread: Re: mac99 SMP
Index(es):
- Date
- Thread