Announcement

**Kano** · 11 July 2012, 12:27 PM

http://ark.intel.com/de/products/64596/Intel-Xeon-Processor-E5-2690-(20M-Cache-2_90-GHz-8_00-GTs-Intel-QPI)

you loose 900 mhz turbo boost max with your stupid settings. enable ALL powersave features including acpi c6 and turbo boost - then test again.

**finance_coder** · 11 July 2012, 12:44 PM

Originally posted by Kano View Post

http://ark.intel.com/de/products/645...GTs-Intel-QPI)

you loose 900 mhz turbo boost max with your stupid settings. enable ALL powersave features including acpi c6 and turbo boost - then test again.

No, for ultra low latency applications, it is absolutely critical to disable all powersaving features in modern CPUs. This is standard practice in industries such as high frequency trading and some other high performance computing situations. There is a measurable latency hit for a CPU to transition from a low power/slow state to a higher power/fast state. The latency to transition from one state to another is lessened in SNB compared to Westmere, but it's still there.

In other words, with this benchmark, enabling any kind of power saving features (on either CPU) makes things worse.

To be fair, turbo boost is debatable. In this particular benchmark, it improves things slightly; but overall, SNB still falls well behind Westmere.

**Kano** · 11 July 2012, 01:28 PM

I basically have got i7-880, i7-2600, i7-3770S. But not yet wheezy on the i7-880. The gcc is important for speed, gcc-4.7 is better as you see here in most benchmarks:

Debian: Squeeze vs. Wheezy On Linux And kFreeBSD - Phoronix

http://www.phoronix.com/scan.php?page=article&item=debian_squeeze_wheezy_2012&num=1

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

You can forget the filesystem benchmarks as that is ext3 vs ext4. Also you could try differnet compiler flags and -march=native. I doubt that you lose so much latency with power management, maybe use a low latency kernel.

**finance_coder** · 11 July 2012, 01:38 PM

If you're interested, I encourage you to compile and run the sample program to which I linked (instructions for building are in the top comment).

FWIW, I have tested this program with several compilers. I haven't tried gcc 4.7 yet, but I have tried 4.6.3 on gentoo, having re-emerged the whole system with -march=corei7-avx, and built my demo program similarly. I also tried Intel's compiler (I didn't rebuild the whole gentoo system w/icc, but did build my program). The different compilers have so far made very little difference.

My sample program doesn't actually do much that is interesting; it's more or less a system call benchmark. So my suspicion is that these kernel functions I'm using are implemented sub-optimally for SNB, or this is simply a corner-case where SNB is slower than previous-gen CPUs.

I've been playing with this for a while, so I'm kind of hoping to get the attention of someone with deeper knowledge of kernel-CPU internals than me.

**Kano** · 11 July 2012, 04:35 PM

Basically your benchmark is the worst multicore example that could be there. Lets talk about cv mode, when you specify different cores htop does never show more than 55% load on each core. With lc mode you see 100% load, but your code is written in both cases to run on 1 core! The speed difference is extreme. I did not wait for cv to finish with 2 different cores, thats just too long. Used i7-3770S, Turbo fixed at 39.

Code:

./snb_slow_demo -c 0 -C 1 -t lc -n 500000000
RUNTIME PARAMS:
    n_iter ..... 500000000
    cpu1 ....... 0
    cpu2 ....... 1
    testname ... lc
runtime, microseconds ... 77268376
runtime, seconds ........ 77.268376

./snb_slow_demo -c 0 -C 0 -t lc -n 500000000
RUNTIME PARAMS:
    n_iter ..... 500000000
    cpu1 ....... 0
    cpu2 ....... 0
    testname ... lc
runtime, microseconds ... 18515720
runtime, seconds ........ 18.515720

./snb_slow_demo -c 0 -C 0 -t cv -n 50000000
RUNTIME PARAMS:
    n_iter ..... 50000000
    cpu1 ....... 0
    cpu2 ....... 0
    testname ... cv
runtime, microseconds ... 74678823
runtime, seconds ........ 74.678823

**liam** · 11 July 2012, 04:50 PM

Originally posted by finance_coder View Post

No, for ultra low latency applications, it is absolutely critical to disable all powersaving features in modern CPUs. This is standard practice in industries such as high frequency trading and some other high performance computing situations. There is a measurable latency hit for a CPU to transition from a low power/slow state to a higher power/fast state. The latency to transition from one state to another is lessened in SNB compared to Westmere, but it's still there.

In other words, with this benchmark, enabling any kind of power saving features (on either CPU) makes things worse.

To be fair, turbo boost is debatable. In this particular benchmark, it improves things slightly; but overall, SNB still falls well behind Westmere.

If latency is important, and you are working for a large finance house, why aren't you using the rh messaging kernel thats designed for low latency?

Announcement

sandy bridge performance hit w/pthread condition variables, contended mutexes?

sandy bridge performance hit w/pthread condition variables, contended mutexes?

Comment

Comment

Comment

Comment

Comment

Comment