qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs in CPUI


From: Zhao Liu
Subject: Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs in CPUID.04H
Date: Wed, 22 Feb 2023 14:37:32 +0800

Hi Xiaoyao,

Thanks, I've spent some time thinking about it here.

On Mon, Feb 20, 2023 at 02:59:20PM +0800, Xiaoyao Li wrote:
> Date: Mon, 20 Feb 2023 14:59:20 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs
>  in CPUID.04H
> 
> On 2/13/2023 5:36 PM, Zhao Liu wrote:
> > For i-cache and d-cache, the maximum IDs for CPUs sharing cache (
> > CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) are
> > both 0, and this means i-cache and d-cache are shared in the SMT level.
> > This is correct if there's single thread per core, but is wrong for the
> > hyper threading case (one core contains multiple threads) since the
> > i-cache and d-cache are shared in the core level other than SMT level.
> > 
> > Therefore, in order to be compatible with both multi-threaded and
> > single-threaded situations, we should set i-cache and d-cache be shared
> > at the core level by default.
> 
> It's true for VM only when the exactly HW topology is configured to VM.
> i.e., two virtual LPs of one virtual CORE are pinned to two physical LPs
> that of one physical CORE.

Yes, in this case, host and guest has the same topology, and their
topology can match.

> Otherwise it's incorrect for VM.

My understanding here is that what we do in QEMU is to create
self-consistent CPU topology and cache topology for the guest.

If the VM topology is self-consistent and emulated to be almost
identical to the real machine, then the emulation in QEMU is correct,
right? ;-)

> 
> for example. given a VM of 4 threads and 2 cores. If not pinning the 4
> threads to physical 4 LPs of 2 CORES. It's likely each vcpu running on a LP
> of different physical cores.

Thanks for bringing this up, this is worth discussing.

I looked into it and found that the specific scheduling policy for the
vCPU actually depends on the host setting. For example, (IIUC) if host

enables core scheduling, then host will schedule the vCPU on the SMT
threads of same core.

Also, to explore the original purpose of the "per thread" i/d cache
topology, I have retraced its history.

The related commit should be in '09, which is 400281a (set CPUID bits
to present cores and threads topology). In this commit, the
multithreading cache topology is added in CPUID.04H. In particular, here
it set the L2 cache level to per core, but it did not change the level of
L1 (i/d cache), that is, L1 still remains per thread.

I think that here is the problem, L1 should also be per core in
multithreading case. (the fix for this patch is worth it?)

Another thing we can refer to is that AMD's i/d cache topology is per
core rather than per thread (different CPUID leaf than intel): In
encode_cache_cpuid8000001d() (target/i386/cpu.c), i/d cache and L2
are encoded as core level in EAX. They set up the per core supposedly
to emulate the L1 topology of the real machine as well.

So, I guess this example is "unintentionally" benefiting from the
"per thread" level of i/d cache.

What do you think?

> So no vcpu shares L1i/L1d cache at core level.

Yes. The scheduling of host is not guaranteed, and workload balance
policies in various scenarios and some security mitigation ways may
break the delicate balance we have carefully set up.

Perhaps another way is to also add a new command "x-l1-cache-topo" (like
[1] did for L2) that can adjust the i/d cache level from core to smt to
benefit cases like this.

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03201.html

Thanks,
Zhao




reply via email to

[Prev in Thread] Current Thread [Next in Thread]