qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-discuss] Fwd: Possible Qemu inconsistency when emulating TLB for r


From: Mattia Maldini
Subject: [Qemu-discuss] Fwd: Possible Qemu inconsistency when emulating TLB for rpi3 machine
Date: Mon, 4 Feb 2019 19:18:36 +0100

---------- Forwarded message ---------
From: Mattia Maldini <address@hidden>
Date: lun 4 feb 2019 alle ore 18:40
Subject: Re: [Qemu-discuss] Possible Qemu inconsistency when emulating TLB
for rpi3 machine
To: Peter Maydell <address@hidden>


> Again, if you're relying on entries being present or not in the
> TLB your code isn't correct. You need to make sure you get the
> right answer whether the CPU has cached a TLB entry or whether
> it does a complete page table walk to get the answer.
I don't rely on the entries to be cached; I configured the systems registers
to perform a table walk on TLB miss, so if the entry is discarded the MMU
happily loads another one. The problem seems to be the opposite, some
inconsistency between the cached entry and the page table.

> We don't store ASIDs in our TLB, we only keep a note of the current
> ASID. If the current ASID changes, we flush the entire TLB. This
> produces architecturally correct behaviour (you never get a TLB hit
> for the wrong ASID because there's never an entry in the TLB for
> the wrong ASID), but it's not exactly what the h/w does.
> If your guest code is correctly holding up its end of the architectural
> contract it should not care.
Thank you, this is an answer I was looking for. I suspected something
like this.

I have now checked again and found a possible solution. My situation was
 the following:
 1. I prepare two page tables, one for EL1 and one for EL0
 2. I initially use the EL1 table
 3. I load the EL0 table and run a process at EL0

The problem was that after loading the new table with a different ASID to
TTBR0 all the entries in the TLB were invalidated (either discarded by Qemu
or ignored because of the different ASID); before I could launch the process
at EL0 a new page was loaded from the EL0 table *but* while still at EL1
-> permission abort.
This happened because both my kernel and process code are in the same
memory range. I solved it (now working both on Qemu and Raspberry) by
changing the loading function address to use TTBR1 instead.

Now, however, I'm questioning why it *did* work on Raspberry Pi... the same
rules should apply, maybe I'm somehow disabling the TLB?
Oh well, questions for another day.

Many thanks for the assistance,
Mattia Maldini

Il giorno lun 4 feb 2019 alle ore 17:40 Peter Maydell <
address@hidden> ha scritto:

> On Mon, 4 Feb 2019 at 16:10, Mattia Maldini <address@hidden>
> wrote:
> > Everything was running smoothly until I started using the MMU: I find
> > myself in a situation where the same code yields different results on
> Qemu
> > and on RPi3, and I believe only the real hardware is behaving correctly.
> I
> > set up a very simplified example to showcase my problem.
> >
> > Basically I have two page tables (identical mapping, just to test the
> > virtual memory): one for the "kernel" running at EL1 and one for a "user
> > process" running at EL0.
> > When the process is running TTBR0 must point to its page table because of
> > permissions: page entries in the EL0 table must include EL0 read-write
> > permissions, while page entries in the EL1 table must forbid them. Each
> > actor (kernel & process) MUST then run on its own set of page tables.
> >
> > Everything the kernel does is setting up the MMU and launching the
> process
> > at EL0; while doing so, he also sets TTBR0 with the correct page table
> for
> > the process. Note that at this point page entries are configured as
> global
> > and I'm not using any ASID.
> >
> > This situation works on Qemu but not on real hardware, and I think the
> > emulator is wrong. Both kernel and process have their code in the same
> > block of memory, so they share MMU entries. When the kernel starts
> > executing its code pages are loaded and saved in the TLB; when the
> process
> > starts it should ask for the same entries and receive the kernel pages
> > saved in the TLB, failing because they are configured to prevent EL0
> > access. This is what happens on the real RPi3.
>
> If your guest code is relying on entries staying in the TLB then it
> is not correct -- there is no architectural guarantee that an entry
> is ever kept in the TLB. An implementation is free to throw out a
> TLB entry any time it likes (it's just a cache, effectively).
> It's hard to say what's actually going on here without a test case,
> but QEMU only promises to run architecturally correct code the
> way the architecture says it should run. It doesn't guarantee to
> run incorrect code in the same way the hardware happens to run it.
> QEMU aims to be an architecturally valid implementation, not
> an implementation that matches the real hardware CPU. (That is, we are
> free to behave differently for things which the architecture manual
> defines as IMPLEMENTATION DEFINED or UNPREDICTABLE.)
>
> > To solve my problem I started using an ASID for the process: I put some
> > value into the two most significant bytes of the TTBR0 register and set
> all
> > page entries as Non-Global. This way when the process asks for a table
> > entry the ones from the kernel found in the TLB cache are discarded
> because
> > of the difference in ASID.
> > This solution works as expected on the Raspberry Pi 3 but results in an
> > abort exception (permission fault) on Qemu.
>
> Again, if you're relying on entries being present or not in the
> TLB your code isn't correct. You need to make sure you get the
> right answer whether the CPU has cached a TLB entry or whether
> it does a complete page table walk to get the answer.
>
> > Furthermore, it seems that Qemu aborts every time it looks for a
> different
> > ASID in the TLB (this is a conjecture, I'm not a Qemu developer myself).
> I
> > have studied the ARMv8 MMU thoroughly and I think my configuration is
> > correct.
>
> We don't abort (unless the guest page table entry indicates that
> we should abort). We will take the slow path for a TLB miss
> (and then do a page table walk); ASID changes will mean we
> get TLB misses (see below).
>
> > Despite being confused about the situation, the fact holds that the real
> > hardware behaves as expected while Qemu does not. What I want to ask at
> > this point is:
> >     1. Does Qemu emulate the ASID-TLB interaction for AArch64? Is this
> > behaviour expected? Do you see an obvious mistake on my side?
> >     2. Can this be a bug in the rpi3 machine Qemu implementation? How
> could
> > I investigate further?
>
> It's certainly possible that there's a QEMU bug, but the TLB/MMU
> code has been around a while. A QEMU bug is more likely if you're
> doing something odd that Linux guests don't do with the MMU.
>
> We don't store ASIDs in our TLB, we only keep a note of the current
> ASID. If the current ASID changes, we flush the entire TLB. This
> produces architecturally correct behaviour (you never get a TLB hit
> for the wrong ASID because there's never an entry in the TLB for
> the wrong ASID), but it's not exactly what the h/w does.
> If your guest code is correctly holding up its end of the architectural
> contract it should not care.
>
> Investigating this sort of bug is a bit painful because you need
> to be familiar with all of (a) the guest code (b) the Arm architecture
> (c) QEMU's internals.
>
> thanks
> -- PMM
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]