Poor CPU performance (sysbench) on Xeon (Celeron 15 times faster)

rrrmiz

New Member
Sep 16, 2024
5
1
3
Please forgive me, I am a bit new to PVE and I am not really a hardware guy, but...

Out of curiosity i ran sysbench cpu run on my new Xeon machine and compared it to my little Celeron box. Both running new versions of PVE.

I ran the tests on the hos OS with no other load on the systems. The Celeron performed about 15 times faster than the Xeon machine.

Is this to be expected? Have I done something seriously wrong during the install? What can I check?

CPU(s) 12 x Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz (1 Socket)
root@xeonhost:~# sysbench cpu run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
events per second: 104.21

General statistics:
total time: 10.0040s
total number of events: 1044

Latency (ms):
min: 9.05
avg: 9.57
max: 12.58
95th percentile: 10.27
sum: 9995.52

Threads fairness:
events (avg/stddev): 1044.0000/0.00
execution time (avg/stddev): 9.9955/0.00

4 x Intel(R) Celeron(R) J4125 CPU @ 2.00GHz (1 Socket)
root@celeronhost:~# sysbench cpu run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
events per second: 1589.19

General statistics:
total time: 10.0003s
total number of events: 15895

Latency (ms):
min: 0.62
avg: 0.63
max: 1.21
95th percentile: 0.67
sum: 9996.29

Threads fairness:
events (avg/stddev): 15895.0000/0.00
execution time (avg/stddev): 9.9963/0.00


Thank's for any input!
 
it's expected, in single thread operation, a CPU can be slower than a 4 newer years CPU.
Mainly Xeon can do more tasks at the same time, but necessarily faster.

edit: older cpu are slower due to mitigations enabled in Linux/Proxmox.
You can set mitigations=off as kernel boot option.
https://forum.proxmox.com/threads/ubuntu-20-vm-much-slower-than-in-esxi.149444/post-677546
The slow Xeon server is new, the Celeron is a couple of years old. My 10 year old Xeon server Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz is doing 935 operations per second as compared to the new one that does 104.

I was not running this test in a VM/CT, I was running it in the host without other load. The benchmark is a single thread CPU test. I still don't understand it...
 
Last edited:
The slow Xeon server is new, the Celeron is a couple of years old. My 10 year old Xeon server Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz is doing 935 operations per second as compared to the new one that does 104.
The D-1528 is an old chip though. Released in 2016. Haswell architecture. It also doesn't really have 12 cores, it has 6 plus SMT.

Your results could be because of mitigations. Or it could be the particular tests you are running are bad for that chip. The Passmark site shows that it should be faster than a Celeron, even on single-thread workloads.

For comparison, I have a D-1541 box, also a Supermicro, mitigations enabled and SMT disabled. I can only access a VM right now as I'm remote. It has 4 cores allocated and sysbench 1.0.20 gives 832 events/sec. So not sure why yours is so slow.

OTOH, the Celeron 5105 I have on my desk here gives 1769 events/sec. I'm thinking it has to do with the architecture is sub-optimal for the primes test.
 
The D-1528 is an old chip though. Released in 2016. Haswell architecture. It also doesn't really have 12 cores, it has 6 plus SMT.

Your results could be because of mitigations. Or it could be the particular tests you are running are bad for that chip. The Passmark site shows that it should be faster than a Celeron, even on single-thread workloads.

For comparison, I have a D-1541 box, also a Supermicro, mitigations enabled and SMT disabled. I can only access a VM right now as I'm remote. It has 4 cores allocated and sysbench 1.0.20 gives 832 events/sec. So not sure why yours is so slow.

OTOH, the Celeron 5105 I have on my desk here gives 1769 events/sec. I'm thinking it has to do with the architecture is sub-optimal for the primes test.

Ok, seems I got myself an "new" old processor from Supermicro.

I tried removing the mitigations with the process outlined by @_gabriel:
vi /etc/default/grub
Edit Line to GRUB_CMDLINE_LINUX_DEFAULT="quiet mitigations=off"
update-grub

This seems to make no difference. I got 95 events/sec after that. I don't know how to verify that "mitigations=off" worked...

I will see what disabling SMT does, but it can not be much I think.

I have a second identical server. Maybe I can install a different OS to see if it makes any difference? Debian? Suggestions?
 
I'm home now and running sysbench on the host itself gives about the same number as in the VM.

Your 1528 has two fewer cores and a slightly lower clock speed (1.9 GHz vs 2.1) than my 1541 but is the same generation and same architecture. The single-thread performance should be pretty close I would think. I've got a half-dozen VM's running but they are idle right now. Something fishy is going on.

My lscpu:

Code:
root@vm-host:~# lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   8
  On-line CPU(s) list:    0-7
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel
  Model name:             Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
    BIOS Model name:      Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz  CPU @ 2.1GHz
    BIOS CPU family:      179
    CPU family:           6
    Model:                86
    Thread(s) per core:   1
    Core(s) per socket:   8
    Socket(s):            1
    Stepping:             3
    CPU(s) scaling MHz:   76%
    CPU max MHz:          2700.0000
    CPU min MHz:          800.0000
    BogoMIPS:             4199.97
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca c
                          mov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
                          pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
                           pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfm
                          perf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 s
                          sse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
                           movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
                           lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 p
                          ti intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority
                           ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bm
                          i2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsa
                          veopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dt
                          herm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features: 
  Virtualization:         VT-x
Caches (sum of all):     
  L1d:                    256 KiB (8 instances)
  L1i:                    256 KiB (8 instances)
  L2:                     2 MiB (8 instances)
  L3:                     12 MiB (1 instance)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-7
Vulnerabilities:         
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Mitigation: Split huge pages
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes,
                          SMT disabled
  Mds:                    Mitigation; Clear CPU buffers; SMT disabled
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sa
                          nitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; RSB fil
                          ling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Mitigation; Clear CPU buffers; SMT disabled
 
I'm home now and running sysbench on the host itself gives about the same number as in the VM.

Your 1528 has two fewer cores and a slightly lower clock speed (1.9 GHz vs 2.1) than my 1541 but is the same generation and same architecture. The single-thread performance should be pretty close I would think. I've got a half-dozen VM's running but they are idle right now. Something fishy is going on.

My lscpu:

Code:
root@vm-host:~# lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   8
  On-line CPU(s) list:    0-7
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel
  Model name:             Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
    BIOS Model name:      Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz  CPU @ 2.1GHz
    BIOS CPU family:      179
    CPU family:           6
    Model:                86
    Thread(s) per core:   1
    Core(s) per socket:   8
    Socket(s):            1
    Stepping:             3
    CPU(s) scaling MHz:   76%
    CPU max MHz:          2700.0000
    CPU min MHz:          800.0000
    BogoMIPS:             4199.97
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca c
                          mov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
                          pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
                           pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfm
                          perf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 s
                          sse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
                           movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
                           lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 p
                          ti intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority
                           ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bm
                          i2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsa
                          veopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dt
                          herm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:
  Virtualization:         VT-x
Caches (sum of all):   
  L1d:                    256 KiB (8 instances)
  L1i:                    256 KiB (8 instances)
  L2:                     2 MiB (8 instances)
  L3:                     12 MiB (1 instance)
NUMA:                   
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-7
Vulnerabilities:       
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Mitigation: Split huge pages
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes,
                          SMT disabled
  Mds:                    Mitigation; Clear CPU buffers; SMT disabled
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sa
                          nitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; RSB fil
                          ling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Mitigation; Clear CPU buffers; SMT disabled

Thanks, this is how mine looks with mitigations=off.
Code:
cat lscpu-mitigations-off-smt-on.txt
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               12
On-line CPU(s) list:                  0-11
Vendor ID:                            GenuineIntel
BIOS Vendor ID:                       Intel
Model name:                           Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz
BIOS Model name:                      Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz  CPU @ 1.9GHz
BIOS CPU family:                      179
CPU family:                           6
Model:                                86
Thread(s) per core:                   2
Core(s) per socket:                   6
Socket(s):                            1
Stepping:                             3
CPU(s) scaling MHz:                   49%
CPU max MHz:                          2500.0000
CPU min MHz:                          800.0000
BogoMIPS:                             3799.98
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization:                       VT-x
L1d cache:                            192 KiB (6 instances)
L1i cache:                            192 KiB (6 instances)
L2 cache:                             1.5 MiB (6 instances)
L3 cache:                             9 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-11
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          KVM: Mitigation: VMX disabled
Vulnerability L1tf:                   Mitigation; PTE Inversion; VMX vulnerable
Vulnerability Mds:                    Vulnerable; SMT vulnerable
Vulnerability Meltdown:               Vulnerable
Vulnerability Mmio stale data:        Vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Vulnerable
Vulnerability Spectre v1:             Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:             Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Not affected; BHI: Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Vulnerable

And here is the diff with mitigations on:
Code:
diff lscpu-mitigations-on-smt-on.txt lscpu-mitigations-off-smt-on.txt  | grep "<"
< CPU(s) scaling MHz:                   79%
< BogoMIPS:                             3800.19
< Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
< Vulnerability L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
< Vulnerability Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
< Vulnerability Meltdown:               Mitigation; PTI
< Vulnerability Mmio stale data:        Mitigation; Clear CPU buffers; SMT vulnerable
< Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
< Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
< Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
< Vulnerability Tsx async abort:        Mitigation; Clear CPU buffers; SMT vulnerable

So turning off mitigations seems to work.

The result of mitigations on/off with sysbench was not significant, but with SMT turned off there is a definitive jump in performance.

Should it be this big difference? Is it just because of how sysbench does the load test that the SMT is so important? In which cases should it be good to use SMT?

Code:
grep "events per second" sysbench-*
sysbench-mitigations-off-smt-on.txt:    events per second:    95.44
sysbench-mitigations-on-smt-off.txt:    events per second:   773.70
sysbench-mitigations-on-smt-on.txt:     events per second:    90.71

Thank you all for the help - Misi
 
Last edited:
  • Like
Reactions: _gabriel
Yeah, 774 events/sec would be in the ballpark of my expectations.

It does appear to be some weird interaction between sysbench and SMT. Not sure what that would show up in a single-threaded test. Maybe it has something to do with the kernel scheduler? Just speculating here.

Glad you got to the bottom of it though!
 
  • Like
Reactions: rrrmiz

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!