qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU ARM SMP: IPI delivery delayed until next main loop


From: Alex Züpke
Subject: Re: [Qemu-devel] QEMU ARM SMP: IPI delivery delayed until next main loop event // how to improve IPI latency?
Date: Tue, 16 Jun 2015 13:11:42 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Hi Peter,

Am 16.06.2015 um 12:59 schrieb Peter Maydell:
> On 16 June 2015 at 11:33, Peter Maydell <address@hidden> wrote:
>> Pressing a key does not unwedge the test case for me.
> 
> Looking at the logs, this seems to be expected given what
> the guest code does with CPU #1: (the below is edited logs,
> created with a hacky patch I have that annotates the debug
> logs with CPU numbers):
> 
> CPU #1: Trace 0x7f2d67afa000 [80000100] _start
>  # we start
> CPU #1: Trace 0x7f2d67afc060 [8000041c] main_cpu1
>  # we correctly figured out we're CPU 1
> CPU #1: Trace 0x7f2d67afc220 [80000448] main_cpu1
>  # we took the branch to 80000448
> CPU #1: Trace 0x7f2d67afc220 [80000448] main_cpu1
>  # 8000448 is a branch-to-self, so here we stay
> 
> CPU #1 never bothered to enable its GICC cpu interface,
> so it will never receive interrupts and will never get
> out of this tight loop.

Yes. CPU#1 is stuck in the initial spinlock which lacks WFE.

> We get here because CPU #1 has got through main_cpu1
> to the point of testing your 'release' variable before
> CPU #0 has got through main_cpu0 far enough to set it
> to 1, so it still has the zero in it that it has on
> system startup. If scheduling happened to mean that
> CPU #0 ran further through main_cpu0 before CPU #1
> ran, we wouldn't end up in this situation -- you have a
> race condition, as I suggested.
> 
> The log shows we're sat with CPU#0 fruitlessly looping
> on a variable in memory, and CPU#1 in this endless loop.

I know that the startup has a racy because I removed too much code from the 
original project.
But the startup is not my problem, it's the later parts.

I added the WFE to the initial lock. Here are two new tests, both are now 3178 
bytes in size:
http://www.cs.hs-rm.de/~zuepke/qemu/ipi.elf
http://www.cs.hs-rm.de/~zuepke/qemu/ipi_yield.elf

Both start on my machine. The IPI ping-pong starts after the first timer 
interrupt after 1s.
The problem is that IPIs are delivered only once a second after the timer 
interrupts QEMU's main loop.


> PS: QEMU doesn't care, but your binary seems to be entirely
> devoid of barrier instructions, which is likely to cause
> you problems on real hardware.
> 
> thanks
> -- PMM

Yes, I trimmed down my code to the bare minimum to handle IPIs on QEMU only. It 
lacks barriers, cache handling and has bogus baudrate settings.


Something else: Existing ARM CPU so far do not use hyper-threading, but have 
real phyical cores.
In contrast, QEMU is an extreme coarse-grained hyper-threading architectures, 
so existing legacy code that was written with physical cores in mind will 
trigger timing bugs in synchronization primitives then, especially code 
originally written for ARM11 MPCore like mine, which lacks WFE/SEV.
If we consider QEMU as a platform to run legacy code, doesn't it make sense to 
address these issues?


Best regards
Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]