qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs in CPUI


From: Xiaoyao Li
Subject: Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs in CPUID.04H
Date: Thu, 23 Feb 2023 11:52:57 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.8.0

On 2/22/2023 2:37 PM, Zhao Liu wrote:
Hi Xiaoyao,

Thanks, I've spent some time thinking about it here.

On Mon, Feb 20, 2023 at 02:59:20PM +0800, Xiaoyao Li wrote:
Date: Mon, 20 Feb 2023 14:59:20 +0800
From: Xiaoyao Li <xiaoyao.li@intel.com>
Subject: Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs
  in CPUID.04H

On 2/13/2023 5:36 PM, Zhao Liu wrote:
For i-cache and d-cache, the maximum IDs for CPUs sharing cache (
CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) are
both 0, and this means i-cache and d-cache are shared in the SMT level.
This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.

Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.

It's true for VM only when the exactly HW topology is configured to VM.
i.e., two virtual LPs of one virtual CORE are pinned to two physical LPs
that of one physical CORE.

Yes, in this case, host and guest has the same topology, and their
topology can match.

Otherwise it's incorrect for VM.

My understanding here is that what we do in QEMU is to create
self-consistent CPU topology and cache topology for the guest.

If the VM topology is self-consistent and emulated to be almost
identical to the real machine, then the emulation in QEMU is correct,
right? ;-)

Real machine tells two threads in the same CORE share the L1 cahche via CUPID because it's the fact and it is how exactly hardware resource lay out. However, for VM, when you tell the same thing (two threads share the L1 cache), is it true for vcpus?

The target is to emulate things correctly, not emulate it identical to real machine. In fact, for these shared resources, it's mostly infeasible to emulate correctly if not pinning vcpus to physical LPs.


for example. given a VM of 4 threads and 2 cores. If not pinning the 4
threads to physical 4 LPs of 2 CORES. It's likely each vcpu running on a LP
of different physical cores.

Thanks for bringing this up, this is worth discussing.

I looked into it and found that the specific scheduling policy for the
vCPU actually depends on the host setting. For example, (IIUC) if host

enables core scheduling, then host will schedule the vCPU on the SMT
threads of same core.

Also, to explore the original purpose of the "per thread" i/d cache
topology, I have retraced its history.

The related commit should be in '09, which is 400281a (set CPUID bits
to present cores and threads topology). In this commit, the
multithreading cache topology is added in CPUID.04H. In particular, here
it set the L2 cache level to per core, but it did not change the level of
L1 (i/d cache), that is, L1 still remains per thread.

I think that here is the problem, L1 should also be per core in
multithreading case. (the fix for this patch is worth it?)

Another thing we can refer to is that AMD's i/d cache topology is per
core rather than per thread (different CPUID leaf than intel): In
encode_cache_cpuid8000001d() (target/i386/cpu.c), i/d cache and L2
are encoded as core level in EAX. They set up the per core supposedly
to emulate the L1 topology of the real machine as well.

So, I guess this example is "unintentionally" benefiting from the
"per thread" level of i/d cache.

What do you think?

So no vcpu shares L1i/L1d cache at core level.

Yes. The scheduling of host is not guaranteed, and workload balance
policies in various scenarios and some security mitigation ways may
break the delicate balance we have carefully set up.

Perhaps another way is to also add a new command "x-l1-cache-topo" (like
[1] did for L2) that can adjust the i/d cache level from core to smt to
benefit cases like this.

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03201.html

Thanks,
Zhao





reply via email to

[Prev in Thread] Current Thread [Next in Thread]