qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs


From: Heiko Sieger
Subject: [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs
Date: Wed, 20 May 2020 23:28:43 -0000

This is the CPU cache layout as shown by lscpu -a -e

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 3800.0000 2200.0000
  1    0      0    1 1:1:1:0          yes 3800.0000 2200.0000
  2    0      0    2 2:2:2:0          yes 3800.0000 2200.0000
  3    0      0    3 3:3:3:1          yes 3800.0000 2200.0000
  4    0      0    4 4:4:4:1          yes 3800.0000 2200.0000
  5    0      0    5 5:5:5:1          yes 3800.0000 2200.0000
  6    0      0    6 6:6:6:2          yes 3800.0000 2200.0000
  7    0      0    7 7:7:7:2          yes 3800.0000 2200.0000
  8    0      0    8 8:8:8:2          yes 3800.0000 2200.0000
  9    0      0    9 9:9:9:3          yes 3800.0000 2200.0000
 10    0      0   10 10:10:10:3       yes 3800.0000 2200.0000
 11    0      0   11 11:11:11:3       yes 3800.0000 2200.0000
 12    0      0    0 0:0:0:0          yes 3800.0000 2200.0000
 13    0      0    1 1:1:1:0          yes 3800.0000 2200.0000
 14    0      0    2 2:2:2:0          yes 3800.0000 2200.0000
 15    0      0    3 3:3:3:1          yes 3800.0000 2200.0000
 16    0      0    4 4:4:4:1          yes 3800.0000 2200.0000
 17    0      0    5 5:5:5:1          yes 3800.0000 2200.0000
 18    0      0    6 6:6:6:2          yes 3800.0000 2200.0000
 19    0      0    7 7:7:7:2          yes 3800.0000 2200.0000
 20    0      0    8 8:8:8:2          yes 3800.0000 2200.0000
 21    0      0    9 9:9:9:3          yes 3800.0000 2200.0000
 22    0      0   10 10:10:10:3       yes 3800.0000 2200.0000
 23    0      0   11 11:11:11:3       yes 3800.0000 2200.0000

I was trying to allocate cache using the cachetune feature in libvirt,
but it turns out to be either misleading or much too complicated to be
usable. Here is what I tried:

  <vcpu placement="static">24</vcpu>
  <cputune>
    <vcpupin vcpu="0" cpuset="0"/>
    <vcpupin vcpu="1" cpuset="12"/>
    <vcpupin vcpu="2" cpuset="1"/>
    <vcpupin vcpu="3" cpuset="13"/>
    <vcpupin vcpu="4" cpuset="2"/>
    <vcpupin vcpu="5" cpuset="14"/>
    <vcpupin vcpu="6" cpuset="3"/>
    <vcpupin vcpu="7" cpuset="15"/>
    <vcpupin vcpu="8" cpuset="4"/>
    <vcpupin vcpu="9" cpuset="16"/>
    <vcpupin vcpu="10" cpuset="5"/>
    <vcpupin vcpu="11" cpuset="17"/>
    <vcpupin vcpu="12" cpuset="6"/>
    <vcpupin vcpu="13" cpuset="18"/>
    <vcpupin vcpu="14" cpuset="7"/>
    <vcpupin vcpu="15" cpuset="19"/>
    <vcpupin vcpu="16" cpuset="8"/>
    <vcpupin vcpu="17" cpuset="20"/>
    <vcpupin vcpu="18" cpuset="9"/>
    <vcpupin vcpu="19" cpuset="21"/>
    <vcpupin vcpu="20" cpuset="10"/>
    <vcpupin vcpu="21" cpuset="22"/>
    <vcpupin vcpu="22" cpuset="11"/>
    <vcpupin vcpu="23" cpuset="23"/>
    <cachetune vcpus="0-2,12-14">
      <cache id="0" level="3" type="both" size="16" unit="MiB"/>
      <monitor level="3" vcpus="0-2,12-14"/>
    </cachetune>
    <cachetune vcpus="3-5,15-17">
      <cache id="1" level="3" type="both" size="16" unit="MiB"/>
      <monitor level="3" vcpus="3-5,15-17"/>
    </cachetune>
    <cachetune vcpus="6-8,18-20">
      <cache id="2" level="3" type="both" size="16" unit="MiB"/>
      <monitor level="3" vcpus="6-8,18-20"/>
    </cachetune>
    <cachetune vcpus="9-11,21-23">
      <cache id="3" level="3" type="both" size="16" unit="MiB"/>
      <monitor level="3" vcpus="9-11,21-23"/>
    </cachetune>
  </cputune>

Unfortunately it gives the following error when I try to start the VM:

Error starting domain: internal error: Missing or inconsistent resctrl
info for memory bandwidth allocation

I have resctrl mounted like this:

mount -t resctrl resctrl /sys/fs/resctrl

This error leads to the following description on how to allocate memory
bandwith: https://software.intel.com/content/www/us/en/develop/articles
/use-intel-resource-director-technology-to-allocate-memory-
bandwidth.html

I think this is over the top and perhaps I'm trying the wrong approach.
All I can say is that every suggestion I've seen and tried so far has
led me to one conclusion: QEMU does NOT support the L3 cache layout of
the new ZEN 2 arch CPUs such as the Ryzen 9 3900X.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1856335

Title:
  Cache Layout wrong on many Zen Arch CPUs

Status in QEMU:
  New

Bug description:
  AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems
  to always map Cache ass if it was an 4-Core per CCX CPU, which is
  incorrect, and costs upwards 30% performance (more realistically 10%)
  in L3 Cache Layout aware applications.

  Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT):

    <cpu mode='custom' match='exact' check='full'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <topology sockets='1' cores='8' threads='1'/>

  In windows, coreinfo reports correctly:

  ****----  Unified Cache 1, Level 3,    8 MB, Assoc  16, LineSize  64
  ----****  Unified Cache 6, Level 3,    8 MB, Assoc  16, LineSize  64

  On a 3-CCX CPU (3960X /w 6 cores and no SMT):

   <cpu mode='custom' match='exact' check='full'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <topology sockets='1' cores='6' threads='1'/>

  in windows, coreinfo reports incorrectly:

  ****--  Unified Cache  1, Level 3,    8 MB, Assoc  16, LineSize  64
  ----**  Unified Cache  6, Level 3,    8 MB, Assoc  16, LineSize  64

  Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm.

  With newer Qemu there is a fix (that does behave correctly) in using the dies 
parameter:
   <qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/>

  The problem is that the dies are exposed differently than how AMD does
  it natively, they are exposed to Windows as sockets, which means, that
  if you are nto a business user, you can't ever have a machine with
  more than two CCX (6 cores) as consumer versions of Windows only
  supports two sockets. (Should this be reported as a separate bug?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions



reply via email to

[Prev in Thread] Current Thread [Next in Thread]