|
From: | Greg Bellows |
Subject: | Re: [Qemu-devel] State of ARM FIQ in Qemu |
Date: | Thu, 13 Nov 2014 09:09:33 -0600 |
Am Mittwoch, 12. November 2014, 10:00:03 schrieb Greg Bellows:
Zounds! You're right, i think this was a typo in my debug script. Which i> On 12 November 2014 07:56, Tim Sander <address@hidden> wrote:
> > Hi Greg
> >
> > > > Bad mode in data abort handler detected
> > > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > > > Modules linked in: firq(O) ipv6
> > > > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G O 3.14.0 #1
> > > > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > > > PC is at 0xffff1240
> > > > LR is at handle_fasteoi_irq+0x9c/0x13c
> > > > pc : [<ffff1240>] lr : [<8005cda0>] psr: 600f01d1
> > > > sp : bf363e70 ip : 07a7e79d fp : 00000000
> > > > r10: 76f92008 r9 : 80590080 r8 : 76e8e4d0
> > > > r7 : f8200100 r6 : bf363fb0 r5 : bf008414 r4 : bf0083c0
> > > > r3 : 80230d04 r2 : 0000002f r1 : 00000000 r0 : bf0083c0
> > > > Flags: nZCv IRQs off FIQs off Mode FIQ_32 ISA ARM Segment user
> > >
> > > It looks like we are in FIQ mode and interrupts have been masked.
> >
> > Indeed.
> >
> > > > Control: 10c53c7d Table: 60004059 DAC: 00000015
> > > > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > > > Stack: (0xbf363e70 to 0xbf364000)
> > > > 3e60: bf0083c0 00000000 0000002f
> > > > 80230d04
> > > > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008
> > > > 00000000
> > > > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff 8005cd04
> > > > 0000002f
> > > > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8
> > > > 80008528
> > > > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180 00000000
> > > > 805baa00
> > > > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000 76e8e4d0
> > > > 80590080
> > > > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744 200f0113
> > > > ffffffff
> > > > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140 bf0079c0
> > > > 8058cc70
> > > > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008 00000000
> > > > 80023af0
> > > > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 00000000
> > > > 76dd3b44
> > > > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000 76f93428
> > > > 76f93428
> > > > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c 76f92008
> > > > 00000000
> > > > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff 9fffd821
> > > > 9fffdc21
> > > > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>]
> >
> > (gic_eoi_irq+0x0/0x4c)
> >
> > > It certainly looks like we are going down the standard IRQ patch as you
> > > suggested. I'm not a Linux driver guy, but do you see any kind of
> >
> > activity
> >
> > > (break points, printfs, ...) through your FIQ handler?
> >
> > I am reaching 0xffff1224 which i believe is the fiq vector address on the
> > vexpress?
>
> Hmmm.... not sure. As you mentioned previously (and as seen in the above
> register dump), I would expect offset 0x1240 (pc=0xffff1240) for an FIQ.
> I'm not sure what is at offset 0x1224, but on my Linux kernel it appears
> that offset 0x1220 is vector_addrexcptn (not pabort), that happens to
> occupy the HYP trap vector.
didn't notice. But i am even reaching 0x1240 before but not 0x1244 which means
it aborts on the first fiq instructions. Here is the "-d int" output directly
after the FIQ hits:
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x800c8dcc //kmem_cache_alloc
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x8001be00 //v7_pabort
Taking exception 3 [Prefetch Abort]
and then it continue to fail on v7_pabort repeatedly. This shows that there is
something fishy going on. It is failing on the presumed handler for the
prefetch abort? But as i see earlier resolved prefetched abort errors i can
conclude that it works up to the point where the CPU is in FIQ mode.
FIQ is special in a way that static mapped memory is needed to avoid a page
lookup as this fails under linux in fiq mode. But 0x800c8dcc (kmem_cache_alloc)
is not called in the FIQ handler which obviously can't use any Linux
infrastructure. And as i do not reach the breakpoint 0xffff1244 these misses
happen on the execution of the first address of the FIQ handler.
Yes, its just a simple set of some registers to control an interrupt. There is
> > > > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > > > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > > > ---[ end trace 3dc3571209a017e1 ]---
> > > > Kernel panic - not syncing: Fatal exception in interrupt
> > >
> > > It is hard to determine entirely what is happening here based on this
> > > info. I do have code of my own that routes KGDB interrupts as FIQs and
> > > with the workaround I see the FIQs handled as expected. Some things we
> >
> > can
> >
> > > try to get more info in hopes of pinpointing where to look:
> > > 1. At the top of hw/intc/arm_gic.c there is the following commented
> >
> > out
> >
> > > line:
> > > //#define DEBUG_GIC
> > >
> > > Uncomment the line, rebuild and rerun. This will give us some trace
> >
> > on
> >
> > > what is going through the GIC code.
> >
> > I have commented out some debug lines but i see:
> > Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at
> > hw/intc/arm_gic.c:120
> > 120 DPRINTF("Raised pending FIQ %d (cpu %d)\n",
> > best_irq, cpu);
> >
> > With the expected irq nr. 49 (32+17).
> >
> > > 2. Run qemu with the "-d int" option which will print a message on
> >
> > each
> >
> > > interrupt. We should see an FIQ at some point if they are occurring.
> >
> > The
> >
> > > only issue is that there will be numerous IRQs, so you'll have to parse
> > > through them to find an "exception 6 [FIQ].
> >
> > Here is the relevant output when the FIQ hits:
> > Taking exception 2 [SVC]
> > Taking exception 2 [SVC]
> > pml: pml_timer_tick: raise_irq
> > arm_gic: Raised pending FIQ 49 (cpu 0)
> > Taking exception 6 [FIQ]
>
> This looks to me like the GIC has caught the interrupt and communicated it
> to the CPU causing it to take the FIQ exception.
>
> > pml: pml_write: update control flags: 1
> > pml: pml_update: start timer
> > pml: pml_update: lower irq
> > pml: pml_read: read magic
> > pml: pml_write: update control flags: 3
> > pml: pml_update: start timer
>
> Is pml your test driver? It looks like it initiates the interrupt and
> possibly performs some handling following it?
i added debug output to this driver to see if and when the FIQ is accessing
the registers. But i see no accesses from FIQ mode.
> > Taking exception 3 [Prefetch Abort]
> > ...with IFSR 0x5 IFAR 0x80221d70
> > Taking exception 4 [Data Abort]
> > ...with DFSR 0x805 DFAR 0x805c604c
> > Taking exception 4 [Data Abort]
> > ...with DFSR 0x805 DFAR 0x805c604c
> > Taking exception 4 [Data Abort]
> >
> > So the fiq is hitting but unfortunatly i have no idea where the data
> > aborts are coming from.
>
> The data aborts are likely a side effect of the prefetch abort taken before
> them; it is the interesting one.
Still as above the address is odd. In FIQ mode it should not jump to this
address at all !?! This is definetly Linux memory space and i am not calling
anything linux related from FIQ.
Given the fact that the addresses in which the fault appears are bogus and not
> > I have shifted all other Irqs besides 49 to group 1 so that only irq 49 is
> > a FIQ.
> > Might it be that i am seeing some secure violations...
> > The address of the IFAR __idr_pre_get which lives in the linux kernel in
> > lib/idr.c seems to
> > be implementing ann integer ID management.
> >
> > > 3. If you set a breakpoint in your driver, is it possible to see that
> > > FIQs are on from the kernel debugger. Clearly you have to try this
> >
> > from
> >
> > > a path where interrupts are masked. I see the following on my system
> > >
> > > mentioned above:
> > > ...
> > > Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
> > > ...
> >
> > So you mean by debugging via the qemu debug port? I have not enabled the
> > kgdb.
> > As stated above, i was not able to catch the fiq irq there. But it might
> > be that i get
> >
> > I have debugged qemu to see if the irq is routed correctly. The depeest
> > call i could find is this: bt
> > #0 tcg_handle_interrupt (cpu=0x555556450790, mask=16) at
> > /home/sander/speedy/soc/qemu/translate-all.c:1503
> > #1 0x0000555555755323 in cpu_interrupt (cpu=0x555556450790, mask=16)
> >
> > at /home/sander/speedy/soc/qemu/include/qom/cpu.h:556
> >
> > #2 0x00005555557561b7 in arm_cpu_set_irq (opaque=0x555556450790, irq=1,
> > level=1)
> >
> > at /home/sander/speedy/soc/qemu/target-arm/cpu.c:261
> >
> > #3 0x00005555558193ec in qemu_set_irq (irq=0x55555642c840, level=1) at
> > hw/core/irq.c:43
> > #4 0x0000555555879073 in gic_update_with_grouping (s=0x5555564dba80) at
> > hw/intc/arm_gic.c:132
> > #5 0x000055555587936d in gic_update (s=0x5555564dba80) at
> > hw/intc/arm_gic.c:180
> > #6 0x00005555558798a7 in gic_set_irq (opaque=0x5555564dba80, irq=49,
> > level=1) at hw/intc/arm_gic.c:264
> > #7 0x00005555558193ec in qemu_set_irq (irq=0x555556432b00, level=1) at
> > hw/core/irq.c:43
> > #8 0x0000555555661d4d in a9mp_priv_set_irq (opaque=0x5555564d7260,
> > irq=17, level=1)
> >
> > at /home/sander/speedy/soc/qemu/hw/cpu/a9mpcore.c:17
> >
> > #9 0x00005555558193ec in qemu_set_irq (irq=0x5555564f3c00, level=1) at
> > hw/core/irq.c:43
> > #10 0x00005555558f6fed in qemu_irq_raise (irq=0x5555564f3c00) at
> > /home/sander/speedy/soc/qemu/include/hw/irq.h:16
> > #11 0x00005555558f7363 in pml_timer_tick (opaque=0x555556595020) at
> > hw/timer/pml.c:95
> > #12 0x000055555599be6e in aio_bh_poll (ctx=0x5555563fdad0) at async.c:82
> > #13 0x00005555559b2d9f in aio_dispatch (ctx=0x5555563fdad0) at
> > aio-posix.c:137
> > #14 0x000055555599c2cb in aio_ctx_dispatch (source=0x5555563fdad0,
> > callback=0x0, user_data=0x0) at async.c:221
> > #15 0x00007ffff7901e04 in g_main_context_dispatch () from
> > /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > #16 0x00005555559b0a79 in glib_pollfds_poll () at main-loop.c:200
> > #17 0x00005555559b0b7a in os_host_main_loop_wait (timeout=0) at
> > main-loop.c:245
> > #18 0x00005555559b0c52 in main_loop_wait (nonblocking=1) at
> > main-loop.c:494
> > #19 0x0000555555791d8b in main_loop () at vl.c:1872
> > #20 0x00005555557998d5 in main (argc=22, argv=0x7fffffffda38,
> > envp=0x7fffffffdaf0) at vl.c:4348
> >
> > I am not sure if arm_cpu_set_irq(opaque=0x555556450790, irq=1, level=1)
> > represents a fiq
> > and if mask 16 is the correct mask for the fiq request.
>
> Yeah this routine handles both IRQs and FIQs. I don't see anything above
> that stands out as suspicious. It may be interesting to try the same test
> driver on an A15 emulation if it is not too much trouble. This would rule
> out the A9 workaround not being sufficient for being GICv2.
accessed by the fiq handler at all. I have seen that starting up a different cpu
is just a matter of a command line option. So i started up my modified vexpress
board (pml hw added) with cortex a15 cpu. Unfortunatly the results are pretty
similar:
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 4 [Data Abort]
...with DFSR 0x5 DFAR 0xbf3d2334 //address not in Kernel space?
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x800120e0 //__dabt_svc
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x80012240 //__pabt_svc
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x80012240//__pabt_svc
Taking exception 3 [Prefetch Abort]
> > Row #6 show clearly that irq 49 configured to Group 0 is triggered. All
> > other interrupt are configured to Group 1
> > from my Linux kernel. The call to #4 gic_update_with_grouping shows that
> > grouping within the GIC is enabled
> > and that irq is triggered as FIQ within qemu. All of this looks good as
> > far as i understand. So i am pretty confident
> > that qemu is working correctly (minus the Prefetch and Data Aborts).
>
> I agree that QEMU appears to be handling the FIQ properly and it appears
> that the CPU is trying to dispatch it. I understand that the Linux FIQ
> handling is a little trickier than IRQs, so I suspect that either something
> in the Linux kernel handling or your driver is going awry during handling
> or as a result of the FIQ.
Yes FIQ's are tricky as you need to avoid the page lookup failures. These are
undesirable in a FIQ anyway. So all the memory i accessed is statically mapped
so that its allways available in the page table.
Best regards
Tim
[Prev in Thread] | Current Thread | [Next in Thread] |