Re: mac99 SMP

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mac99 SMP

From:	BALATON Zoltan
Subject:	Re: mac99 SMP
Date:	Thu, 6 Mar 2025 17:22:28 +0100 (CET)

On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:

On Thu, Mar 6, 2025 at 6:57 PM Andrew Randrianasulu
<randrianasulu@gmail.com> wrote:

On Thu, Mar 6, 2025 at 6:41 PM BALATON Zoltan <balaton@eik.bme.hu> wrote:


On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:

чт, 6 мар. 2025 г., 18:16 BALATON Zoltan <balaton@eik.bme.hu>:

On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:

On Thu, Mar 6, 2025 at 4:12 PM BALATON Zoltan <balaton@eik.bme.hu>

wrote:


On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:

чт, 6 мар. 2025 г., 05:10 BALATON Zoltan <balaton@eik.bme.hu>:

On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:

On Thu, Mar 6, 2025 at 2:02 AM Andrew Randrianasulu
<randrianasulu@gmail.com> wrote:

On Thu, Mar 6, 2025 at 12:21 AM BALATON Zoltan <balaton@eik.bme.hu>

wrote:

So is that the ISI that I saw? Line 308 is end of DSI handler but

log

name

shows ISI handler. But you had no ISI logs with -d int so I don't

get

it.

What are the registers of that CPU at that point? One of those

should

tell

from where it got to the ISI handler but backtrace does not show

that.

(Check CPU docs which reg has the address that caused the

exception, I

don't remember.)

this was case when I set sstep bits to 0x1 - it does not bring up

second

cpu,  AND does not single step into irq_save function in

core99_kick_cpu :(



So I assumed  least impactful mode (0x1) is not very useful for

detailed

single stepping into this specific function.

But 0x3, 0x5 and default 0x7 all works, as far as single stepping

and

bringing up secondary cpu are concerned.


You were stepping throgh CPU0 but the interesting part is what

CPU1 is

doing so maybe try to trace that:

thread 2
b *0x100


It advances to 0x400 (second cpu) but no further than this


0x400 is the ISI vector so it seems it hits that for some reason

which is

the same I saw with -d int,mmu and probably it shouldn't get those

before

it copies MMU setup from CPU0.

see gdb log

Note that I used

maintenance packet Qqemu.sstep=0x1
sending: Qqemu.sstep=0x1
received: "OK"

and ctrl-c first thread when it failed to single step (single

stepping

on thread 2/CPU1 was already impossible)

It show different state but no obvious function pointers :/


Aw, this one was without second qemu in mttcg mode :/

New gdb log attached


As much as I understand it shows CPU0 waiting for CPU1 to set it's
call_in_map entry (that's OK and expected) while CPU1 is getting ISIs

or

some other exceptions (which it likely shouldn't get) but I still

don't

see how far CPU1 got in its init code and what triggers these ISIs?

When

in ISI handler there's a register that has the fault address which is
where it jumped there from. I told you to check PPC manual for that. I
looked it up now it's SRR0 ("Set to the effective address of the
instruction that the processor would have attempted to execute next

if no

exception conditions were present (if the exception occurs on

attempting

to fetch a branch target, SRR0 is set to the branch target address)").
What code that address belongs to? That's what causing the ISIs and we
should find out why.


Thanks for looking it up

At very first moment when 0x100 breakpoint hit and gdb autoswitches to
thread 2 it shows empty r0-r32 and

srr0           0x100               256

at next "step" (not really step because single stepping starts run away
execution with sstep bits set to 0x1)

srr0           0xc000439c          -1073724516

same as pc (program counter)

pc             0xc000439c          0xc000439c <InstructionAccess_virt>

so it already in its bad state? Not sure how to get any in-between

state?


May be enable normal ssbits (0x7) just after cpu1 is hit its breakpoint
and then single step ?


You can step by assembly instruction when theres's no source or line
number info with gdb "stepi" command, with that you should be able to

step

through assembly code.


Thanks! I replaced  thread 1 / step with stepi on thread 2

it ended up with

* 2    Thread 1.2 (CPU#1 [running]) 0xc0006d30 in

vmap_stack_overflow_virt () at

arch/powerpc/kernel/head_book3s_32.S:375
++thread 2
[Switching to thread 2 (Thread 1.2)]
#0  0xc0006d30 in vmap_stack_overflow_virt () at

arch/powerpc/kernel/head_book3s

_32.S:375
375             b       interrupt_return
++backtrace
#0  0xc0006d30 in vmap_stack_overflow_virt () at

arch/powerpc/kernel/head_book3s

_32.S:375
#1  0x00000000 in ?? ()


This does not make much sense because
arch/powerpc/kernel/head_book3s_32.S:375 is end of FPU unavailable
exception (which probably should not happen) and has nothing to do with
vmap_stack_overflow_virt() (which I don't know where is as it's not
present in the older Linux sources I was looking at). In any case it looks
like the problem is that unexpected exceptions are happening that causes
CPU1 to interrupt its code execution and jump to uninitialised or wrong
vectors and prevent it to init correctly. I don't know if this is because
on real machine this would run from cache and won't cause exceptions or
something is not correctly emulated so I don't know how to fix. I remember
a similar problem with MorphOS which worked on real machine but caused
problem in QEMU but could be prevented by turning off the MSR DR IR bits
until the exception vectors were correctly set up. But in that case
OpenBIOS enabled these bits and MorphOS did not disable before trying to
change exception vectors but here for second CPU I think it should start
after a reset with these bits disabled so the question is what enables
them in Linux and at that point is it ready to get exceptions?



well, cpu0 starts ok at least ...


can we just add disassembly command to gdb script?


You can try

display x/i $pc


++display x/i $pc
my-qemu-gdb-script.gdb:27: Error in sourced command file:
No symbol "x" in current context.


https://stackoverflow.com/questions/1902901/show-current-assembly-instruction-in-gdb

set  disassemble-next-line on
show disassemble-next-line


result attached

And where is that code in the Linux sources? The line you quoted sets itto be Reset but I don't see where that's defined. This seems to be earlyafter CPU1 has started and tries to jump into kernel code. Without MMU setup how that's supposed to work? Why this does not cause exceptions on realmachine? Also CPU0 is handling decrementer interrupt meanwhile. Is thatrelevant to the problem and is there some unwanted interaction here?


Regards,
BALATON Zoltan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: mac99 SMP, (continued)

Prev by Date: Re: mac99 SMP
Next by Date: Re: mac99 SMP
Previous by thread: Re: mac99 SMP
Next by thread: Re: mac99 SMP
Index(es):
- Date
- Thread