[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs
From: |
Heiko Sieger |
Subject: |
[Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs |
Date: |
Sun, 10 May 2020 20:01:51 -0000 |
I upgraded to QEMU emulator version 5.0.50
Using q35-5.1 (the latest) and the following libvirt configuration:
<memory unit="KiB">50331648</memory>
<currentMemory unit="KiB">50331648</currentMemory>
<memoryBacking>
<hugepages/>
</memoryBacking>
<vcpu placement="static">24</vcpu>
<cputune>
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="12"/>
<vcpupin vcpu="2" cpuset="1"/>
<vcpupin vcpu="3" cpuset="13"/>
<vcpupin vcpu="4" cpuset="2"/>
<vcpupin vcpu="5" cpuset="14"/>
<vcpupin vcpu="6" cpuset="3"/>
<vcpupin vcpu="7" cpuset="15"/>
<vcpupin vcpu="8" cpuset="4"/>
<vcpupin vcpu="9" cpuset="16"/>
<vcpupin vcpu="10" cpuset="5"/>
<vcpupin vcpu="11" cpuset="17"/>
<vcpupin vcpu="12" cpuset="6"/>
<vcpupin vcpu="13" cpuset="18"/>
<vcpupin vcpu="14" cpuset="7"/>
<vcpupin vcpu="15" cpuset="19"/>
<vcpupin vcpu="16" cpuset="8"/>
<vcpupin vcpu="17" cpuset="20"/>
<vcpupin vcpu="18" cpuset="9"/>
<vcpupin vcpu="19" cpuset="21"/>
<vcpupin vcpu="20" cpuset="10"/>
<vcpupin vcpu="21" cpuset="22"/>
<vcpupin vcpu="22" cpuset="11"/>
<vcpupin vcpu="23" cpuset="23"/>
</cputune>
<os>
<type arch="x86_64" machine="pc-q35-5.1">hvm</type>
<loader readonly="yes"
type="pflash">/usr/share/OVMF/x64/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
<boot dev="hd"/>
<bootmenu enable="no"/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state="on"/>
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vpindex state="on"/>
<synic state="on"/>
<stimer state="on"/>
<vendor_id state="on" value="AuthenticAMD"/>
<frequencies state="on"/>
</hyperv>
<kvm>
<hidden state="on"/>
</kvm>
<vmport state="off"/>
<ioapic driver="kvm"/>
</features>
<cpu mode="host-passthrough" check="none">
<topology sockets="1" cores="12" threads="2"/>
<cache mode="passthrough"/>
<feature policy="require" name="invtsc"/>
<feature policy="require" name="hypervisor"/>
<feature policy="require" name="topoext"/>
<numa>
<cell id="0" cpus="0-2,12-14" memory="12582912" unit="KiB"/>
<cell id="1" cpus="3-5,15-17" memory="12582912" unit="KiB"/>
<cell id="2" cpus="6-8,18-20" memory="12582912" unit="KiB"/>
<cell id="3" cpus="9-11,21-23" memory="12582912" unit="KiB"/>
</numa>
</cpu>
...
/var/log/libvirt/qemu/win10.log:
-machine
pc-q35-5.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format
\
-cpu
host,invtsc=on,hypervisor=on,topoext=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-synic,hv-stimer,hv-vendor-id=AuthenticAMD,hv-frequencies,hv-crash,kvm=off,host-cache-info=on,l3-cache=off
\
-m 49152 \
-overcommit mem-lock=off \
-smp 24,sockets=1,cores=12,threads=2 \
-mem-prealloc \
-mem-path /dev/hugepages/libvirt/qemu/3-win10 \
-numa node,nodeid=0,cpus=0-2,cpus=12-14,mem=12288 \
-numa node,nodeid=1,cpus=3-5,cpus=15-17,mem=12288 \
-numa node,nodeid=2,cpus=6-8,cpus=18-20,mem=12288 \
-numa node,nodeid=3,cpus=9-11,cpus=21-23,mem=12288 \
...
For some reason I always get l3-cache=off.
CoreInfo.exe in Windows 10 then produces the following report
(shortened):
Logical to Physical Processor Map:
**---------------------- Physical Processor 0 (Hyperthreaded)
--*--------------------- Physical Processor 1
---*-------------------- Physical Processor 2
----**------------------ Physical Processor 3 (Hyperthreaded)
------**---------------- Physical Processor 4 (Hyperthreaded)
--------*--------------- Physical Processor 5
---------*-------------- Physical Processor 6
----------**------------ Physical Processor 7 (Hyperthreaded)
------------**---------- Physical Processor 8 (Hyperthreaded)
--------------*--------- Physical Processor 9
---------------*-------- Physical Processor 10
----------------**------ Physical Processor 11 (Hyperthreaded)
------------------**---- Physical Processor 12 (Hyperthreaded)
--------------------*--- Physical Processor 13
---------------------*-- Physical Processor 14
----------------------** Physical Processor 15 (Hyperthreaded)
Logical Processor to Socket Map:
************************ Socket 0
Logical Processor to NUMA Node Map:
***---------***--------- NUMA Node 0
---***---------***------ NUMA Node 1
------***---------***--- NUMA Node 2
---------***---------*** NUMA Node 3
Approximate Cross-NUMA Node Access Cost (relative to fastest):
00 01 02 03
00: 1.4 1.2 1.1 1.2
01: 1.1 1.1 1.3 1.1
02: 1.0 1.1 1.0 1.2
03: 1.1 1.2 1.2 1.2
Logical Processor to Cache Map:
**---------------------- Data Cache 0, Level 1, 32 KB, Assoc 8,
LineSize 64
**---------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8,
LineSize 64
**---------------------- Unified Cache 0, Level 2, 512 KB, Assoc 8,
LineSize 64
***--------------------- Unified Cache 1, Level 3, 16 MB, Assoc 16,
LineSize 64
--*--------------------- Data Cache 1, Level 1, 32 KB, Assoc 8,
LineSize 64
--*--------------------- Instruction Cache 1, Level 1, 32 KB, Assoc 8,
LineSize 64
--*--------------------- Unified Cache 2, Level 2, 512 KB, Assoc 8,
LineSize 64
---*-------------------- Data Cache 2, Level 1, 32 KB, Assoc 8,
LineSize 64
---*-------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8,
LineSize 64
---*-------------------- Unified Cache 3, Level 2, 512 KB, Assoc 8,
LineSize 64
---***------------------ Unified Cache 4, Level 3, 16 MB, Assoc 16,
LineSize 64
----**------------------ Data Cache 3, Level 1, 32 KB, Assoc 8,
LineSize 64
----**------------------ Instruction Cache 3, Level 1, 32 KB, Assoc 8,
LineSize 64
----**------------------ Unified Cache 5, Level 2, 512 KB, Assoc 8,
LineSize 64
------**---------------- Data Cache 4, Level 1, 32 KB, Assoc 8,
LineSize 64
------**---------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8,
LineSize 64
------**---------------- Unified Cache 6, Level 2, 512 KB, Assoc 8,
LineSize 64
------**---------------- Unified Cache 7, Level 3, 16 MB, Assoc 16,
LineSize 64
--------*--------------- Data Cache 5, Level 1, 32 KB, Assoc 8,
LineSize 64
--------*--------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8,
LineSize 64
--------*--------------- Unified Cache 8, Level 2, 512 KB, Assoc 8,
LineSize 64
--------*--------------- Unified Cache 9, Level 3, 16 MB, Assoc 16,
LineSize 64
---------*-------------- Data Cache 6, Level 1, 32 KB, Assoc 8,
LineSize 64
---------*-------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8,
LineSize 64
---------*-------------- Unified Cache 10, Level 2, 512 KB, Assoc 8,
LineSize 64
---------***------------ Unified Cache 11, Level 3, 16 MB, Assoc 16,
LineSize 64
----------**------------ Data Cache 7, Level 1, 32 KB, Assoc 8,
LineSize 64
----------**------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8,
LineSize 64
----------**------------ Unified Cache 12, Level 2, 512 KB, Assoc 8,
LineSize 64
------------**---------- Data Cache 8, Level 1, 32 KB, Assoc 8,
LineSize 64
------------**---------- Instruction Cache 8, Level 1, 32 KB, Assoc 8,
LineSize 64
------------**---------- Unified Cache 13, Level 2, 512 KB, Assoc 8,
LineSize 64
------------***--------- Unified Cache 14, Level 3, 16 MB, Assoc 16,
LineSize 64
--------------*--------- Data Cache 9, Level 1, 32 KB, Assoc 8,
LineSize 64
--------------*--------- Instruction Cache 9, Level 1, 32 KB, Assoc 8,
LineSize 64
--------------*--------- Unified Cache 15, Level 2, 512 KB, Assoc 8,
LineSize 64
---------------*-------- Data Cache 10, Level 1, 32 KB, Assoc 8,
LineSize 64
---------------*-------- Instruction Cache 10, Level 1, 32 KB, Assoc 8,
LineSize 64
---------------*-------- Unified Cache 16, Level 2, 512 KB, Assoc 8,
LineSize 64
---------------*-------- Unified Cache 17, Level 3, 16 MB, Assoc 16,
LineSize 64
----------------**------ Data Cache 11, Level 1, 32 KB, Assoc 8,
LineSize 64
----------------**------ Instruction Cache 11, Level 1, 32 KB, Assoc 8,
LineSize 64
----------------**------ Unified Cache 18, Level 2, 512 KB, Assoc 8,
LineSize 64
----------------**------ Unified Cache 19, Level 3, 16 MB, Assoc 16,
LineSize 64
------------------**---- Data Cache 12, Level 1, 32 KB, Assoc 8,
LineSize 64
------------------**---- Instruction Cache 12, Level 1, 32 KB, Assoc 8,
LineSize 64
------------------**---- Unified Cache 20, Level 2, 512 KB, Assoc 8,
LineSize 64
------------------***--- Unified Cache 21, Level 3, 16 MB, Assoc 16,
LineSize 64
--------------------*--- Data Cache 13, Level 1, 32 KB, Assoc 8,
LineSize 64
--------------------*--- Instruction Cache 13, Level 1, 32 KB, Assoc 8,
LineSize 64
--------------------*--- Unified Cache 22, Level 2, 512 KB, Assoc 8,
LineSize 64
---------------------*-- Data Cache 14, Level 1, 32 KB, Assoc 8,
LineSize 64
---------------------*-- Instruction Cache 14, Level 1, 32 KB, Assoc 8,
LineSize 64
---------------------*-- Unified Cache 23, Level 2, 512 KB, Assoc 8,
LineSize 64
---------------------*** Unified Cache 24, Level 3, 16 MB, Assoc 16,
LineSize 64
----------------------** Data Cache 15, Level 1, 32 KB, Assoc 8,
LineSize 64
----------------------** Instruction Cache 15, Level 1, 32 KB, Assoc 8,
LineSize 64
----------------------** Unified Cache 25, Level 2, 512 KB, Assoc 8,
LineSize 64
Logical Processor to Group Map:
************************ Group 0
The above result is even further away from the actual L3 cache configuration.
So numatune doesn't produce the expected outcome.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1856335
Title:
Cache Layout wrong on many Zen Arch CPUs
Status in QEMU:
New
Bug description:
AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems
to always map Cache ass if it was an 4-Core per CCX CPU, which is
incorrect, and costs upwards 30% performance (more realistically 10%)
in L3 Cache Layout aware applications.
Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT):
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-IBPB</model>
<vendor>AMD</vendor>
<topology sockets='1' cores='8' threads='1'/>
In windows, coreinfo reports correctly:
****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64
----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64
On a 3-CCX CPU (3960X /w 6 cores and no SMT):
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-IBPB</model>
<vendor>AMD</vendor>
<topology sockets='1' cores='6' threads='1'/>
in windows, coreinfo reports incorrectly:
****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64
----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64
Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm.
With newer Qemu there is a fix (that does behave correctly) in using the dies
parameter:
<qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/>
The problem is that the dies are exposed differently than how AMD does
it natively, they are exposed to Windows as sockets, which means, that
if you are nto a business user, you can't ever have a machine with
more than two CCX (6 cores) as consumer versions of Windows only
supports two sockets. (Should this be reported as a separate bug?)
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Heiko Sieger, 2020/05/03
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Heiko Sieger, 2020/05/03
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Babu Moger, 2020/05/05
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Heiko Sieger, 2020/05/07
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Babu Moger, 2020/05/07
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Damir, 2020/05/10
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs,
Heiko Sieger <=
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Jan Klos, 2020/05/14
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Jan Klos, 2020/05/14
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Jan Klos, 2020/05/15
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Damir, 2020/05/15
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Babu Moger, 2020/05/15
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Jan Klos, 2020/05/17
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Jan Klos, 2020/05/17
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Heiko Sieger, 2020/05/18
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Babu Moger, 2020/05/18
- [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs, Heiko Sieger, 2020/05/18