Hey, I'm running a bare-metal image on QEMU 6.1 and I've encountered the following scenario:
After receiving a data abort and mapping in the correct page I try to invalidate the corresponding TLB entry using the following assembly sequence:
tlbi vaae1is, x0
Unfortunately this does not seem to have any immediate effect, as upon returning back to the source of the exception I immediately hit the same Data Abort. This cycle of receiving a Data Abort and then updating the mapping continues for 100s of times, until the TLB finally updates to the correct mapping.
As part of my testing I also tried to replace the Inner Shareable tlbi I showed above with the base version that only invalidates the current PE's TLB entry (tlbi vaae1, x0) this seemed to fix the issue, which made me suspect something was up with QEMU itself, as the inner shareable version of the instruction is supposed to invalidate the current PE's TLB entry as well as the others', so if the non-shareable version works the inner-shareable one should work as well.
After digging a bit through the code I saw that the non-shareable version calls 'tlb_flush_page_bits_by_mmuidx' which eventually calls 'tlb_flush_range_by_mmuidx_async_0' synchronously, while the inner-shareable version calls 'tlb_flush_page_bits_by_mmuidx_all_cpus_synced' which also eventually calls 'tlb_flush_range_by_mmuidx_async_0', but asynchronously this time.
Moving on to the implementation of the DSB instruction I saw that it is translated into an 'INDEX_op_mb' operation, but looking at the interpreter handling of that instruction, it simply performs a memory barrier, it does not handle any of the async tasks in the work queue (at least explicitly) so from my (admittedly basic) understanding of the code it looks like QEMU's implementation of the DSB instruction does not wait until the TLB flush has finished, as required.
If anyone can point me in the right direction it would be greatly appreciated.
Thanks, Idan Horowitz.