On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:
On Thu, Mar 6, 2025 at 4:12 PM BALATON Zoltan <balaton@eik.bme.hu>
wrote:
On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:
чт, 6 мар. 2025 г., 05:10 BALATON Zoltan <balaton@eik.bme.hu>:
On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:
On Thu, Mar 6, 2025 at 2:02 AM Andrew Randrianasulu
<randrianasulu@gmail.com> wrote:
On Thu, Mar 6, 2025 at 12:21 AM BALATON Zoltan <balaton@eik.bme.hu>
wrote:
So is that the ISI that I saw? Line 308 is end of DSI handler but
log
name
shows ISI handler. But you had no ISI logs with -d int so I don't
get
it.
What are the registers of that CPU at that point? One of those
should
tell
from where it got to the ISI handler but backtrace does not show
that.
(Check CPU docs which reg has the address that caused the
exception, I
don't remember.)
this was case when I set sstep bits to 0x1 - it does not bring up
second
cpu, AND does not single step into irq_save function in
core99_kick_cpu :(
So I assumed least impactful mode (0x1) is not very useful for
detailed
single stepping into this specific function.
But 0x3, 0x5 and default 0x7 all works, as far as single stepping
and
bringing up secondary cpu are concerned.
You were stepping throgh CPU0 but the interesting part is what
CPU1 is
doing so maybe try to trace that:
thread 2
b *0x100
It advances to 0x400 (second cpu) but no further than this
0x400 is the ISI vector so it seems it hits that for some reason
which is
the same I saw with -d int,mmu and probably it shouldn't get those
before
it copies MMU setup from CPU0.
see gdb log
Note that I used
maintenance packet Qqemu.sstep=0x1
sending: Qqemu.sstep=0x1
received: "OK"
and ctrl-c first thread when it failed to single step (single
stepping
on thread 2/CPU1 was already impossible)
It show different state but no obvious function pointers :/
Aw, this one was without second qemu in mttcg mode :/
New gdb log attached
As much as I understand it shows CPU0 waiting for CPU1 to set it's
call_in_map entry (that's OK and expected) while CPU1 is getting ISIs
or
some other exceptions (which it likely shouldn't get) but I still
don't
see how far CPU1 got in its init code and what triggers these ISIs?
When
in ISI handler there's a register that has the fault address which is
where it jumped there from. I told you to check PPC manual for that. I
looked it up now it's SRR0 ("Set to the effective address of the
instruction that the processor would have attempted to execute next
if no
exception conditions were present (if the exception occurs on
attempting
to fetch a branch target, SRR0 is set to the branch target address)").
What code that address belongs to? That's what causing the ISIs and we
should find out why.
Thanks for looking it up
At very first moment when 0x100 breakpoint hit and gdb autoswitches to
thread 2 it shows empty r0-r32 and
srr0 0x100 256
at next "step" (not really step because single stepping starts run away
execution with sstep bits set to 0x1)
srr0 0xc000439c -1073724516
same as pc (program counter)
pc 0xc000439c 0xc000439c <InstructionAccess_virt>
so it already in its bad state? Not sure how to get any in-between
state?
May be enable normal ssbits (0x7) just after cpu1 is hit its breakpoint
and then single step ?
You can step by assembly instruction when theres's no source or line
number info with gdb "stepi" command, with that you should be able to
step
through assembly code.
Thanks! I replaced thread 1 / step with stepi on thread 2
it ended up with
* 2 Thread 1.2 (CPU#1 [running]) 0xc0006d30 in
vmap_stack_overflow_virt () at
arch/powerpc/kernel/head_book3s_32.S:375
++thread 2
[Switching to thread 2 (Thread 1.2)]
#0 0xc0006d30 in vmap_stack_overflow_virt () at
arch/powerpc/kernel/head_book3s
_32.S:375
375 b interrupt_return
++backtrace
#0 0xc0006d30 in vmap_stack_overflow_virt () at
arch/powerpc/kernel/head_book3s
_32.S:375
#1 0x00000000 in ?? ()
This does not make much sense because
arch/powerpc/kernel/head_book3s_32.S:375 is end of FPU unavailable
exception (which probably should not happen) and has nothing to do with
vmap_stack_overflow_virt() (which I don't know where is as it's not
present in the older Linux sources I was looking at). In any case it looks
like the problem is that unexpected exceptions are happening that causes
CPU1 to interrupt its code execution and jump to uninitialised or wrong
vectors and prevent it to init correctly. I don't know if this is because
on real machine this would run from cache and won't cause exceptions or
something is not correctly emulated so I don't know how to fix. I remember
a similar problem with MorphOS which worked on real machine but caused
problem in QEMU but could be prevented by turning off the MSR DR IR bits
until the exception vectors were correctly set up. But in that case
OpenBIOS enabled these bits and MorphOS did not disable before trying to
change exception vectors but here for second CPU I think it should start
after a reset with these bits disabled so the question is what enables
them in Linux and at that point is it ready to get exceptions?