|
From: | Richard Henderson |
Subject: | Re: An issue with x86 tcg and MMIO |
Date: | Wed, 1 Feb 2023 11:50:50 -1000 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
On 2/1/23 05:24, Jørgen Hansen wrote:
Hello Richard, We are using x86 qemu to test some CXL stuff, and in that process we are running into an issue with tcg. In qemu, CXL memory is mapped as an MMIO region, so when using CXL memory as part of the system memory, application code and data can be spread out across a combination of DRAM and CXL memory (we are using the Linux tiered memory numa balancing, that will migrate individual pages between DRAM and CXL memory based on access patterns). When we are running memory intensive applications, we hit the following assert in translator_access: /* We cannot handle MMIO as second page. */ assert(tb->page_addr[1] != -1); introduced in your commit 50627f1b. This is using existing applications and standard Linux. We discussed this with Alistair Francis and he mentioned that it looks like a load across a page boundary is happening, and it so happens that the first page is DRAM and second page MMIO. We tried - as a workaround - to return NULL instead of the assert to trigger the slow path processing, and that allows the system to make forward progress, but we aren't familiar with tcg, and as such aren't sure if that is a correct fix. So we'd like to get your input on this - and understand whether the above usage isn't supported at all or if there are other possible workarounds.
Well, this may answer my question in 1d6b1894-9c45-2d70-abde-9c10c1b3b93f@linaro.org/">https://lore.kernel.org/qemu-devel/1d6b1894-9c45-2d70-abde-9c10c1b3b93f@linaro.org/ as to how this could occur.Until relatively recently, TCG would refuse to execute out of MMIO *at all*. This was relaxed to support Arm m-profile, which needs to execute a few instructions out of MMIO during the boot process, before jumping into flash.
This works by reading one instruction, translating it, executing it, and immediately discarding the translation. It could be possible to adjust the translator to allow the second half of an instruction to be in MMIO, such that we execute and discard, however...
What is it about CXL that requires modeling with MMIO? If it is intended to be used interchangeably with RAM by the guest, then you really won't like the performance you will see with TCG executing out of these regions.
Could memory across the CXL link be modeled as a ROM device, similar to flash? This does not have the same restrictions as MMIO.
r~
[Prev in Thread] | Current Thread | [Next in Thread] |