Re: mac99 SMP

qemu-ppc
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mac99 SMP

From:	Andrew Randrianasulu
Subject:	Re: mac99 SMP
Date:	Wed, 5 Mar 2025 03:15:57 +0300
On Wed, Mar 5, 2025 at 2:55 AM Andrew Randrianasulu
<randrianasulu@gmail.com> wrote:
>
> On Wed, Mar 5, 2025 at 2:31 AM BALATON Zoltan <balaton@eik.bme.hu> wrote:
> >
> > On Wed, 5 Mar 2025, Andrew Randrianasulu wrote:
> > > On Tue, Mar 4, 2025 at 6:56 PM BALATON Zoltan <balaton@eik.bme.hu> wrote:
> > >>
> > >> On Tue, 4 Mar 2025, Andrew Randrianasulu wrote:
> > >>> On Tue, Mar 4, 2025 at 6:09 PM Andrew Randrianasulu
> > >>> <randrianasulu@gmail.com> wrote:
> > >>>>
> > >>>> On Tue, Mar 4, 2025 at 4:17 PM BALATON Zoltan <balaton@eik.bme.hu> 
> > >>>> wrote:
> > >>>>>
> > >>>>> On Tue, 4 Mar 2025, Andrew Randrianasulu wrote:
> > >>>>>> I think this is maximum, I added few more printks, you can see them 
> > >>>>>> in
> > >>>>>> earlier log. But sadly it detours into assembly and all this 
> > >>>>>> livepatching
> > >>>>>> ....
> > >>>>>
> > >>>>> See linux/Documentation/admin-guide/kernel-parameters.txt loglevel=7 
> > >>>>> or
> > >>>>> ignore_loglevel should be the maxumum level but it seems it's in the
> > >>>>> assembly part where there are no more detailed logs printed.
> > >>>>>
> > >>>>>> Can qemu only print traces if instructed so from monitor or even 
> > >>>>>> remote gdb
> > >>>>>> ? Otherwise logs will be huge ...
> > >>>>>
> > >>>>> You can enable traces from QEMU monitor but not debug options I think.
> > >>>>>
> > >>>>>> I trued to just add -d mmu and log was overwhelming ...
> > >>>>>
> > >>>>> Yes this is a lot of logs, redirect to a file with &> then cut out the
> > >>>>> interesting part. Especially when it's stuck you would see a lot of 
> > >>>>> ISIs
> > >>>>> repeating so you can stop at that point, no need to wait until it 
> > >>>>> finds
> > >>>>> the CPU stuck, we just need the first few of those ISI logs. What I'm
> > >>>>> wondering if when stepping through are these don't happen or the MMU 
> > >>>>> gets
> > >>>>> set up before that happens? Or maybe it has something to do with the 
> > >>>>> DECR
> > >>>>> interrupt CPU0 gets but gdb handles somehow, it prints a message about
> > >>>>> that and the ISIs on CPU1 seem to start at that point so maybe that
> > >>>>> changes something somewhere that causes CPU1 to start getting 
> > >>>>> interrupts
> > >>>>> too when it should not yet?
> > >>>>
> > >>>>
> > >>>> Well, logs are at ~100MB each, I tried to diff two of them (both cases
> > >>>> were run  with -S -s params, so remote gdb was always there, just in
> > >>>> one case it was just executing without breakpoint set)
> > >>>>
> > >>>>  ./qemu-system-ppc -d int,mmu -M mac99,via=pmu -cpu g4    -cdrom
> > >>>> ~/ISO/install-powerpc-universal-2008.0.iso    -boot d -g 1024x768x8
> > >>>> -smp 2  -bios ~/K38_sdcard1/Documents/openbios-qemu-smp.elf -display
> > >>>> sdl -accel tcg,thread=multi -m 2G -kernel ~/boot/vmlinux-6.12.17
> > >>>> -append "console=ttyPZ0" -S -s  > LOG4.mmu  2>&1
> > >>>>
> > >>>> ./qemu-system-ppc -d int,mmu -M mac99,via=pmu -cpu g4    -cdrom
> > >>>> ~/ISO/install-powerpc-universal-2008.0.iso    -boot d -g 1024x768x8
> > >>>> -smp 2  -bios ~/K38_sdcard1/Documents/openbios-qemu-smp.elf -display
> > >>>> sdl -accel tcg,thread=multi -m 2G -kernel ~/boot/vmlinux-6.12.17
> > >>>> -append "console=ttyPZ0" -S -s  > LOG3.mmu  2>&1
> > >>>>
> > >>>> LOG3 was with breakpoint LOG4 without
> > >>>>
> > >>>> Diff between them is like 9.5 MB, I see no ISI there, but DECR appear 
> > >>>> ...
> > >>>>
> > >>>> Sorry, I just do not know how to interpret this. :(
> > >>>
> > >>> They compres extremely well, down to less than ! mb, so attaching them 
> > >>> here ...
> > >>
> > >> Better not send large attachments to lists with lot of subscribers. 
> > >> Upload
> > >> it somewhere and post the URL only.
> > >
> > > sorry, will do that next time.
> > >
> > >
> > > These are just the QEMU debug logs but
> > >> missing the linux logs inbetween so we can't see where is the relevant
> > >> part. Keep -serial stdio and also -append "console=ttyPZ0 debug
> > >> ignore_loglevel" then you can see what part of the log to cut which is 
> > >> the
> > >> interesting part around and after smp_kick. You don't seem to get the 
> > >> same
> > >> issue I saw with the Ubuntu image JD used so maybe this newer Linux
> > >> version has some other issue. I have no idea what and if it can't be
> > >> reproduced when stepping through with gdb then I don't know how to find
> > >> the place where it stops. You could still try getting backtrace of the
> > >> vCPUs when it's waiting for the second CPU when you have no breakpoint 
> > >> set
> > >> to see what code is ececuted at that point, then look at the sources in
> > >> Linux to see what it tries to do (if it's in assembly you'll have to 
> > >> learn
> > >> enough of that to undrestand it).
> > >
> > > I tried to set breakpoint to assembly label __start_scondary_pmac_0
> > > and while breakpoint can be set it never hit during normal execution
> > > (no additional breakpoints)
> >
> > That will be executed by CPU1 so I think you should set the breakpoint on
> > thread 2. Maybe:
> >
> > (gdb) thread 2
> > (gdb) b __start_secondary_pmac_0
> >
> > or something like that. (Not sure if you need to use __ or _ at the
> > beginning, sometimes these change, set breakpoint for both or for whatever
> > symbol gdb recognises.)
> >
> > > I also run into problems with getting both stdio and -d output into same 
> > > file
> > >
> > > ./qemu-system-ppc -M mac99,via=pmu -cpu g4   -smp 2  -bios
> > > ~/K38_sdcard1/Documents/openbios-qemu-smp.elf -accel tcg,thread=multi
> > > -m 2G -kernel ~/boot/vmlinux-6.12.17 -append "console=ttyPZ0" -vga
> > > none -serial stdio  -d int,mmu 2>&1 > LOG6.mm
> >
> > The order of 2>&1 and > are significant. The above says redirect stderr to
> > stdout then redirect stdout to file so file will only contain stdout and
> > stderr will be printed where stdout was before redirection. If you want
> > both to go to file swap these to first redirect stdout then send stderr to
> > same file or use &> which redirects both.
> >
> > > this line leaves only
> > >
> > > s>> et_property: NULL phandle
> > >
> > >>> =============================================================
> > >>> OpenBIOS 1.1 [Mar 1 2025 05:50]
> > >>> Configuration device id QEMU version 1 machine id 1
> > >>> CPUs: 2
> > >>> Memory: 2048M
> > >>> UUID: 00000000-0000-0000-0000-000000000000
> > >>> CPU type PowerPC,G4
> > >>> CPU type PowerPC,G4
> > > milliseconds isn't unique.
> > > Output device screen not found.
> > > Output device screen not found.
> > >>> [ppc] Kernel already loaded (0x01000000 + 0x01760a6c) (initrd 
> > >>> 0x00000000 + 0x00000000)
> > >>> [ppc] Kernel command line: console=ttyPZ0
> > >>> switching to new context:
> > >
> > > in log :(
> > >
> > > trying with -nographic and keeping -serial stdio qemu complain
> > >
> > > ./qemu-system-ppc -M mac99,via=pmu -cpu g4    -cdrom
> > > ~/ISO/install-powerpc-universal-2008.0.iso    -boot d -g 1024x768x8
> > > -smp 2  -bios ~/K38_sdcard1/Documents/openbios-qemu-smp.elf -display
> > > sdl -accel tcg,thread=multi -m 2G -kernel ~/boot/vmlinux-6.12.17
> > > -append "console=ttyPZ0" -nographic -serial stdio -d int,mmu 2>&1 >
> > > LOG6.mm
> > > qemu-system-ppc: -accel tcg,thread=multi: warning: Guest not yet
> > > converted to MTTCG - you may get unexpected results
> > > qemu-system-ppc: -serial stdio: cannot use stdio by multiple character 
> > > devices
> > > qemu-system-ppc: -serial stdio: could not connect serial device to
> > > character backend 'stdio'
> >
> > OK if you have -nographic then everything will go to stdio so you don't
> > need - serial stdio but you should still try the -append line with debug
> > and ignore_loglevel in case that makes linux logs more verbose.
> >
> > > so I am not sure how to mix both console output (from kernel) and
> > > debug traces from -d switch.
> > >
> > > I did another gdb.txt (without quitting gdb it appends to log file)
> >
> > I can't make sense of that output as it has no commands so I don't know
> > what output is for what command. It's also at multiple backtraces and
> > random output so I don't know which is which.
>
> I tried  "set trace-commands on" before setting logging on.
>
> It looks better now?


Interesting, I hit "s" few more times (for second cpu thread) and it
eventually hit this code:

No locals.
#3  0xc0023180 in start_secondary (unused=<optimized out>) at
arch/powerpc/kernel/smp.c:1639
        cpu = 1
#4  0x00003338 in ?? ()
No symbol table info available.
+s
_set_L2CR () at arch/powerpc/kernel/l2cr_6xx.S:91
91              li      r3,-1
+s
92              blr
+s
95              mflr    r9
+s
100             sync
+s
104             mfmsr   r7              /* Save MSR in r7 */
+s
105             rlwinm  r4,r7,0,17,15
+s
106             rlwinm  r4,r4,0,28,26   /* Turn off DR bit */
+s
107             sync
+s
108             mtmsr   r4
+s
109             isync
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+bt full
#0  _set_L2CR () at arch/powerpc/kernel/l2cr_6xx.S:109
No locals.
#1  0xc0067b88 in core99_init_caches (cpu=1) at
arch/powerpc/platforms/powermac/smp.c:676
        core99_l2_cache = <error reading variable core99_l2_cache
(Cannot access memory at address 0xc16fac94)>
        core99_l3_cache = <error reading variable core99_l3_cache
(Cannot access memory at address 0xc16fac98)>
Backtrace stopped: Cannot access memory at address 0xf1075f94
+s
Cannot access memory at address 0xc001f228
+s
Cannot access memory at address 0xc001f228
+c
Continuing.


======

I hit "c" because I had no idea  for how long it will be stuck like this ...

But eventually kernel come up with no timestamp delay and both cpus ...

Does this mean isync actually at fault?

>
>
>
>
> We only really need one
> > backtrace from CPU1 when it's stuck but before the kernel says it's stuck
> > and still waiting for it as we want to find out what's it doing at that
> > point and where it's stuck.
> >
> > > it shows cpu1 coming out of halt, but I have no idea where exactly it
> > > start to behave differently from normal run
> > >
> > > may be reset vector is not set?
> >
> > I think reset address is correct, in kick function setting excp_prefix to
> > 0 then calling cpu_reset sets the reset address to 0x100 which is where
> > Linux expects it. Also it works for MacOS and Linux when stepping through
> > so I think the correct code is started but does not finish when not slowed
> > down by stepping or something like that but I don't know why and where it
> > stops. If we find where it's stuck we may be able to find out why.
> >
> > Regards,
> > BALATON Zoltan
gdb-trace.txt
Description: Text document
[Prev in Thread]
Current Thread
[Next in Thread]
Re: mac99 SMP, (continued)
Prev by Date: Re: mac99 SMP
Next by Date: Re: mac99 SMP
Previous by thread: Re: mac99 SMP
Next by thread: Re: mac99 SMP
Index(es):
- Date
- Thread