[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mac99 SMP
From: |
Andrew Randrianasulu |
Subject: |
Re: mac99 SMP |
Date: |
Wed, 5 Mar 2025 05:46:19 +0300 |
On Wed, Mar 5, 2025 at 5:13 AM Andrew Randrianasulu
<randrianasulu@gmail.com> wrote:
>
>
>
> ср, 5 мар. 2025 г., 04:10 BALATON Zoltan <balaton@eik.bme.hu>:
>>
>> On Wed, 5 Mar 2025, Andrew Randrianasulu wrote:
>> >> I tried "set trace-commands on" before setting logging on.
>> >>
>> >> It looks better now?
>>
>> Yes, this is more readable but still don't know where it is stuck when you
>> just run with continue as this shows when you're stepping through that
>> works. We would need a backtrace when you start it with continue then when
>> it's waiting for the second CPU after it started it but before it reports
>> it's stuck press ctrl-c then "thread apply all backtrace full" to get
>> where the CPUs are at that point.
+target remote 10.0.2.2:1234
Remote debugging using 10.0.2.2:1234
0xfff00100 in ?? ()
+c
Continuing.
Thread 1 hit Breakpoint 1.1, 0xc0067d4c in smp_core99_kick_cpu () at
./arch/powerpc/include/asm/io.h:167
167 DEF_MMIO_OUT_D(out_8, 8, stb);
+c
Continuing.
Thread 1 hit Breakpoint 1.2, smp_core99_kick_cpu (nr=1) at
arch/powerpc/platforms/powermac/smp.c:802
802 if (nr < 0 || nr > 3)
+c
Continuing.
Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 1.2]
0xc000439c in InstructionAccess_virt () at
arch/powerpc/kernel/head_book3s_32.S:308
308 b interrupt_return
+thread apply all backtrace full
Thread 2 (Thread 1.2 (CPU#1 [running])):
+backtrace full
#0 0xc000439c in InstructionAccess_virt () at
arch/powerpc/kernel/head_book3s_32.S:308
No locals.
#1 0x00000000 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0xefff4124
Thread 1 (Thread 1.1 (CPU#0 [running])):
+backtrace full
#0 0xc0023044 in __cpu_up (cpu=1, tidle=<optimized out>) at
arch/powerpc/kernel/smp.c:1329
boot_spin_ms = 5000
booting = <optimized out>
hp_spin_ms = 1
deadline = <optimized out>
rc = 0
spin_wait_ms = 5000
__dummy = <optimized out>
__dummy2 = <optimized out>
__dummy = <optimized out>
__dummy2 = <optimized out>
#1 0xc008a454 in bringup_cpu (cpu=1) at kernel/cpu.c:877
st = 0xeeddcb24
idle = 0xc19956a0
ret = <optimized out>
__vpp_verify = <optimized out>
#2 0xc008ad6c in cpuhp_invoke_callback (cpu=cpu@entry=1,
state=CPUHP_BRINGUP_CPU, bringup=bringup@entry=true,
node=node@entry=0x0, lastp=lastp@entry=0x0) at kernel/cpu.c:194
st = 0xeeddcb24
step = 0xc1591990 <cpuhp_hp_states+1740>
cbm = <optimized out>
cb = 0xc008a3f4 <bringup_cpu>
ret = <optimized out>
cnt = <optimized out>
__vpp_verify = <optimized out>
#3 0xc008bde4 in __cpuhp_invoke_callback_range
(bringup=bringup@entry=true, cpu=cpu@entry=1, st=st@entry=0xeeddcb24,
target=target@entry=CPUHP_BRINGUP_CPU, nofail=nofail@entry=false) at
kernel/cpu.c:965
err = <optimized out>
state = <optimized out>
ret = <optimized out>
#4 0xc008c11c in cpuhp_invoke_callback_range (bringup=true, cpu=1,
st=0xeeddcb24, target=CPUHP_BRINGUP_CPU) at kernel/cpu.c:989
No locals.
#5 cpuhp_up_callbacks (target=CPUHP_BRINGUP_CPU, cpu=1,
st=0xeeddcb24) at kernel/cpu.c:1020
prev_state = <optimized out>
ret = 0
prev_state = <optimized out>
ret = <optimized out>
__func__ = "cpuhp_up_callbacks"
__UNIQUE_ID_ddebug747 = {modname = 0xc0fb2a54 "cpu", function
= 0xc0dd4bec <__func__.0> "cpuhp_up_callbacks", filename = 0xc0fae87c
"kernel/cpu.c", format = 0xc0fae96c "CPU UP failed (%d) CPU %u state
%s (%d)\n", lineno = 1022, class_id = 63, flags = 0, key =
{dd_key_true = {key = {enabled = {counter = 0}, {type = 3238560028,
entries = 0xc108811c, next = 0xc108811c}}}, dd_key_false = {key =
{enabled = {counter = 0}, {type = 3238560028, entries = 0xc108811c,
next = 0xc108811c}}}}}
branch = <optimized out>
__ret_warn_on = <optimized out>
#6 _cpu_up (tasks_frozen=0, target=CPUHP_BRINGUP_CPU, cpu=1) at
kernel/cpu.c:1690
st = 0xeeddcb24
idle = <optimized out>
ret = <optimized out>
st = <optimized out>
idle = <optimized out>
ret = <optimized out>
out = <optimized out>
__vpp_verify = <optimized out>
__ptr = <optimized out>
__UNIQUE_ID_x_756 = <optimized out>
__UNIQUE_ID_y_757 = <optimized out>
#7 cpu_up (cpu=cpu@entry=1, target=CPUHP_ONLINE) at kernel/cpu.c:1722
err = 0
#8 0xc14188a8 in cpuhp_bringup_mask (mask=<optimized out>,
target=<optimized out>, ncpus=3) at kernel/cpu.c:1788
st = <optimized out>
__vpp_verify = <optimized out>
cpu = 1
#9 bringup_nonboot_cpus (max_cpus=<optimized out>) at kernel/cpu.c:1896
No locals.
#10 0xc1420480 in smp_init () at kernel/smp.c:1009
num_nodes = 1
num_cpus = <optimized out>
#11 0xc1403ee8 in kernel_init_freeable () at init/main.c:1572
No locals.
#12 0xc0008c74 in kernel_init (unused=<optimized out>) at init/main.c:1469
ret = <optimized out>
#13 0xc00212ec in ret_from_kernel_user_thread () at
arch/powerpc/kernel/entry_32.S:193
No locals.
=====
Ugh ... "InstructionAccess_virt () at arch/powerpc/kernel/head_book3s_32.S:308"
https://elixir.bootlin.com/linux/v6.12.17/source/arch/powerpc/kernel/head_book3s_32.S#L308
Data Access Expection? But why .....
>>
>> > Interesting, I hit "s" few more times (for second cpu thread) and it
>> > eventually hit this code:
>> >
>> > No locals.
>> > #3 0xc0023180 in start_secondary (unused=<optimized out>) at
>> > arch/powerpc/kernel/smp.c:1639
>> > cpu = 1
>> > #4 0x00003338 in ?? ()
>> > No symbol table info available.
>> > +s
>> > _set_L2CR () at arch/powerpc/kernel/l2cr_6xx.S:91
>> > 91 li r3,-1
>> > +s
>> > 92 blr
>> > +s
>> > 95 mflr r9
>> > +s
>> > 100 sync
>> > +s
>> > 104 mfmsr r7 /* Save MSR in r7 */
>> > +s
>> > 105 rlwinm r4,r7,0,17,15
>> > +s
>> > 106 rlwinm r4,r4,0,28,26 /* Turn off DR bit */
>> > +s
>> > 107 sync
>> > +s
>> > 108 mtmsr r4
>> > +s
>> > 109 isync
>> > +s
>> > Cannot access memory at address 0xc001f228
>> [...]
>> > ======
>> >
>> > I hit "c" because I had no idea for how long it will be stuck like this
>> > ...
>> >
>> > But eventually kernel come up with no timestamp delay and both cpus ...
>> >
>> > Does this mean isync actually at fault?
>>
>> No, it says in the comment that it turns off DR bit in MSR which is
>> disabling MMU for data access. Once it's disabled you cannot access memory
>> but this code doesn't access memory, it tries to flush cashes and only
>> pokes CPU registers during that. The cache isn't emulated in QEMU so this
>> does nothing but it could be it's expecting some bits in non-emulated
>> registers to change which does not change on QEMU so maybe it's stuck
>> there waiting for that. But that's just a guess. If we don't know where
>> exactly it's stopped waiting you could only go through this code and try
>> to find if those bits it waits for are correctly emulated. Or to verify it
>> stops in this code at all you could comment out where it's called from as
>> this would do nothing on QEMU anyway (you can only comment out the set_LC*
>> functions, which clear caches, other functions would break it but as
>> there's no cache on QEMU this probably doesn't break it just skip these).
>> If that fixes the issue then we at least know which function it is and
>> then try to find where in that function it's stopped. What I still don't
>> understand is why it works when stepping through it? It should behave the
>> same as when not stepping.
>
>
> https://qemu-project.gitlab.io/qemu/system/gdb.html
>
> ====
>
> Advanced debugging options
>
> Changing single-stepping behaviour
>
> The default single stepping behavior is step with the IRQs and timer service
> routines off. It is set this way because when gdb executes a single step it
> expects to advance beyond the current instruction. With the IRQs and timer
> service routines on, a single step might jump into the one of the interrupt
> or exception vectors instead of executing the current instruction. This means
> you may hit the same breakpoint a number of times before executing the
> instruction gdb wants to have executed. Because there are rare circumstances
> where you want to single step into an interrupt vector the behavior can be
> controlled from GDB.
>
> ======
>
> may be due to this?
>
> I'll try "thread apply all backtrace full" in next gdb session, tomorrow.
>
> Thanks for assisting!
>
>
>
>>
>> Regards,
>> BALATON Zoltan
>>
- Re: mac99 SMP, (continued)
- Re: mac99 SMP, BALATON Zoltan, 2025/03/04
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/04
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/04
- Re: ***UNCHECKED*** Re: mac99 SMP, BALATON Zoltan, 2025/03/04
- Re: ***UNCHECKED*** Re: mac99 SMP, Andrew Randrianasulu, 2025/03/04
- Re: mac99 SMP, BALATON Zoltan, 2025/03/04
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/04
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/04
- Re: mac99 SMP, BALATON Zoltan, 2025/03/04
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/04
- Re: mac99 SMP,
Andrew Randrianasulu <=
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/04
- Re: mac99 SMP, BALATON Zoltan, 2025/03/05
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/05
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/05
- Re: mac99 SMP, BALATON Zoltan, 2025/03/05
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/05
- Re: mac99 SMP, BALATON Zoltan, 2025/03/05
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/05
- Re: mac99 SMP, BALATON Zoltan, 2025/03/05
- Re: mac99 SMP, Andrew Randrianasulu, 2025/03/05