[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] CXL: TCG/KVM instruction alignment issue discussion default
From: |
Jonathan Cameron |
Subject: |
Re: [RFC] CXL: TCG/KVM instruction alignment issue discussion default |
Date: |
Tue, 28 Feb 2023 10:49:16 +0000 |
On Mon, 27 Feb 2023 11:06:47 +0000
Jørgen Hansen <Jorgen.Hansen@wdc.com> wrote:
> On 2/18/23 11:22, Gregory Price wrote:
> > Breaking this off into a separate thread for archival sake.
> >
> > There's a bug with handling execution of instructions held in CXL
> > memory - specifically when an instruction crosses a page boundary.
> >
> > The result of this is that type-3 devices cannot use KVM at all at the
> > moment, and require the attached patch to run in TCG-only mode.
> >
> >
> > CXL memory devices are presently emulated as MMIO, and MMIO has no
> > coherency guarantees, so TCG doesn't cache the results of translating
> > an instruction, meaning execution is incredibly slow (orders of
> > magnitude slower than KVM).
> >
> >
> > Request for comments:
> >
> >
> > First there's the stability issue:
> >
> > 0) TCG cannot handle instructions across a page boundary spanning ram and
> > MMIO. See attached patch for hotfix. This basically solves the page
> > boundary issue by reverting the entire block to MMIO-mode if the
> > problem is detected.
> >
> > 1) KVM needs to be investigated. It's likely the same/similar issue,
> > but it's not confirmed.
>
> I ran into an issue with KVM as well. However, it wasn't a page boundary
> spanning issue, since I could hit it when using pure CXL backed memory
> for a given application. It turned out that (at least) certain AVX
> instructions didn't handle execution from MMIO when using qemu. This
> generated an illegal instruction exception for the application. At that
> point, I switched to tcg, so I didn't investigate if running a non-AVX
> system would work with KVM.
Short term I'm wondering if we should attempt to error out on KVM
unless some override parameter is used alongside the main cxl=on
>
> > Second there's the performance issue:
> >
> > 0) Do we actually care about performance? How likely are users to
> > attempt to run software out of CXL memory?
> >
> > 1) If we do care, is there a potential for converting CXL away from the
> > MMIO design? The issue is coherency for shared memory. Emulating
> > coherency is a) hard, and b) a ton of work for little gain.
> >
> > Presently marking CXL memory as MMIO basically enforces coherency by
> > preventing caching, though it's unclear how this is enforced
> > by KVM (or if it is, i have to imagine it is).
>
> Having the option of doing device specific processing of accesses to a
> CXL type 3 device (that the MMIO based access allows) is useful for
> experimentation with device functionality, so I would be sad to see that
> option go away. Emulating cache line access to a type 3 device would be
> interesting, and could potentially be implemented in a way that would
> allow caching of device memory in a shadow page in RAM, but that it a
> rather large project.
Absolutely agree. Can sketch a solution that is entirely in QEMU and
works with KVM on a white board, but it doesn't feel like a small job
to actually implement it and I'm sure there are nasty corners
(persistency is going to be tricky)
If anyone sees this as a 'fun challenge' and wants to take it on though
that would be great!
Jonathan
>
> > It might be nice to solve this for non-shared memory regions, but
> > testing functionality >>> performance at this point so it might not
> > worth the investment.
>
> Thanks,
> Jorgen