qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Puzzling performance comparison with KVM and Hyper-V


From: Blair Bethwaite
Subject: Re: [Qemu-discuss] Puzzling performance comparison with KVM and Hyper-V
Date: Wed, 22 Jul 2015 11:00:04 +1000

Interesting numbers Tim.

Further to what others have said re. pinning, try looking at simple
lscpu or numactl. Here's some relevant output from a 2 socket Ivy
Bridge box (imagine this would be the same for you and indicates your
first pinning attempt was off) - see below.

Re. Hyper-V, firstly it's interesting that it appears to be pinning
automatically, or at least doing a much better scheduling job than
Linux/KVM. Do you see numa topology of the host in your Hyper-V
guests?


~$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Stepping:              4
CPU MHz:               2200.190
BogoMIPS:              4401.41
Virtualisation:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

~# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 0 size: 80578 MB
node 0 free: 29053 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 1 size: 80637 MB
node 1 free: 48872 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

Four socket Ivy Bridge box for comparison:
~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             4
NUMA node(s):          4
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Stepping:              4
CPU MHz:               2599.969
BogoMIPS:              5201.96
Virtualisation:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60
NUMA node1 CPU(s):     1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61
NUMA node2 CPU(s):     2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62
NUMA node3 CPU(s):     3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63

~# numactl -H
available: 4 nodes (0-3)
node 0 cpus: 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60
node 0 size: 64450 MB
node 0 free: 51557 MB
node 1 cpus: 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
node 1 size: 64509 MB
node 1 free: 5413 MB
node 2 cpus: 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62
node 2 size: 64509 MB
node 2 free: 51842 MB
node 3 cpus: 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63
node 3 size: 64509 MB
node 3 free: 46402 MB
node distances:
node   0   1   2   3
  0:  10  20  30  20
  1:  20  10  20  30
  2:  30  20  10  20
  3:  20  30  20  10

On 22 July 2015 at 04:40, Tim Bell <address@hidden> wrote:
>> -----Original Message-----
>> From: Stephan von Krawczynski [mailto:address@hidden
>> Sent: 21 July 2015 18:05
>> To: Carlos Torres <address@hidden>
>> Cc: Tim Bell <address@hidden>; address@hidden
>> Subject: Re: [Qemu-discuss] Puzzling performance comparison with KVM
>> and Hyper-V
>>
>> On Tue, 21 Jul 2015 15:12:26 +0000
>> Carlos Torres <address@hidden> wrote:
>>
>> >
>> > ________________________________________
>> > From: Tim Bell <address@hidden>
>> > Sent: Tuesday, July 21, 2015 9:53 AM
>> > To: Stephan von Krawczynski
>> > Cc: Carlos Torres; address@hidden
>> > Subject: Re: [Qemu-discuss] Puzzling performance comparison with KVM
>> > and Hyper-V
>> >
>> > On Tue, 21 Jul 2015, Stephan von Krawczynski wrote:
>> >
>> > > On Tue, 21 Jul 2015 16:16:22 +0200
>> > > Tim Bell <address@hidden> wrote:
>> > >
>> > >>
>> > >>
>> > >> On Tue, 21 Jul 2015, Carlos Torres wrote:
>> > >>
>> > >>> On Jul 21, 2015 5:45 AM, Tim Bell <address@hidden> wrote:
>> >
>> > >
>> > > Have you worked out the pinning? The cpu numbers are _not_ in line
>> > > with core/SMT distribution over the physical dies.
>> > >
>> >
>> > I think we did it correctly but please give me some pointers to check...
>> > the vCPU to CPU mapping we used was as below
>> >
>> > vCPU:CPU
>> > 0: 0
>> > 1: 1
>> > 2: 2
>> > 3: 3
>> > 4: 4
>> > 5: 5
>> > 6: 6
>> > 7: 7
>> > 8: 16
>> > 9: 17
>> > 10: 18
>> > 11: 19
>> > 12: 20
>> > 13: 21
>> > 14: 22
>> > 15: 23
>> > 16: 8
>> > 17: 9
>> > 18: 10
>> > 19: 11
>> > 20: 12
>> > 21: 13
>> > 22: 14
>> > 23: 15
>> > 24: 24
>> > 25: 25
>> > 26: 26
>> > 27: 27
>> > 28: 28
>> > 29: 29
>> > 30: 30
>> > 31: 31
>> >
>> > >
>> > > --
>> > > Regards,
>> > > Stephan
>> > >
>> >
>> > Hi Tim,
>> >
>> > 'cat /proc/cpuinfo' should give you this information, look at the
>> > processor, physical id and core id values
>> >
>> > For example, here's an excerpt from my laptop that has 4 physical cores
>> up to 2 threads per core (8 cpus total).
>> >
>> > processor       : 0                         <=== CPU # given by the OS
>> > vendor_id       : GenuineIntel
>> > .....
>> > physical id     : 0                         <==== Physical Processor 
>> > Socket ID
>> > siblings        : 8
>> > core id         : 0                           <==== Physical CPU Core ID
>> > cpu cores       : 4
>> > ....
>> >
>> > processor       : 1
>> > vendor_id       : GenuineIntel
>> > .....
>> > physical id     : 0                        <=== Same Processor socket (my 
>> > laptop has
>> only 1)
>> > siblings        : 8
>> > core id         : 1                          <==== Distinct CPU Physical 
>> > core
>> > cpu cores       : 4
>> >
>> > ....
>> >
>> > processor       : 4
>> > vendor_id       : GenuineIntel
>> > .....
>> > physical id     : 0                      <=== Still same Processor socket 
>> > (expected)
>> > siblings        : 8
>> > core id         : 0                        <=== Note this is the same 
>> > physical core as
>> processor 0 above
>> > cpu cores       : 4
>> >
>> >
>> > -- Carlos Torres
>>
>> Carlos is right. You have to closely check /proc/cpuinfo, look at this 
>> example
>> from a box with 2 physical processors and 8 cores 2 threads each (32
>> overall):
>>
>> processor       : 0
>> physical id     : 0
>> siblings        : 16
>> core id         : 0
>> cpu cores       : 8
>>
>> processor       : 1
>> physical id     : 1
>> siblings        : 16
>> core id         : 0
>> cpu cores       : 8
>>
>> processor       : 2
>> physical id     : 0
>> siblings        : 16
>> core id         : 1
>> cpu cores       : 8
>>
>> processor       : 3
>> physical id     : 1
>> siblings        : 16
>> core id         : 1
>> cpu cores       : 8
>>
>> ...
>>
>> As you can see processor 0 and 2 are on physical id 0 but processor 1 and 3
>> are on physical id _1_. This means that 0 and 1 are on a completely
>> different die inside the box which has heavy impact on numa setup and
>> access.
>> There is no hint anyone can give you, you have to check your very personal
>> setup here to find out what logical processor is located where.
>> You have to arrange the pinning so that multiple processors in one virtual
>> host are located as near as possible. Stay on the same die and if possible
>> use the SMT as nearest neighbor because it has the same cache.
>>
>
> Thanks. We'll try a mini cloud  up with OpenStack Kilo since there are lots 
> of Numa and THP changes there. We can then see what else needs further tuning.
>
>> --
>> Regards,
>> Stephan
>



-- 
Cheers,
~Blairo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]