Announcement

**RobbieAB** · 13 April 2009, 12:02 AM

Originally posted by bnolsen View Post

The linux kernel is aware of the difference between real and virtual cores and tries to schedule with those in mind. That was added back when the P4 went hyperthreaded.

The core i7 is more super scalar than the P4 and should be able to scale much better.

So the Linux kernel, compiled with the correct options, can schedule correctly? Nice, so long as more complexity in the kernel doesn't cause any slowdowns...

IIRC, Hyper-Threading in the P4s was estimated to cause a 30% slowdown under full load. I will confess to never having verified that figure, but my general experience with it was the big gain from hyper-threading lay in the interactivity area, and not in "throughput" which is what most benchmarks test.

**gtrawoger** · 13 April 2009, 08:32 AM

Originally posted by downhillgames View Post

So, I installed Wallbuntu 9.04 Beta 1 on some spare space I had on this HDD here.

The benchmark results may be found here: http://global.phoronix-test-suite.co...6937-12014-280

Here is my test-result. Are most of the tests in here multi-threated? Maybe that's why the Core i7 beats the pants off my 8400 @ 4Ghz.

http://global.phoronix-test-suite.co...512-2646-27047

**gtrawoger** · 13 April 2009, 10:49 AM

Here is a benchmark of an X2 5200. http://global.phoronix-test-suite.co...76-16696-29149

I also agree, and nice summary or total view of all submitted benchmark would be nice.

**bnolsen** · 13 April 2009, 12:48 PM

Originally posted by RobbieAB View Post

So the Linux kernel, compiled with the correct options, can schedule correctly? Nice, so long as more complexity in the kernel doesn't cause any slowdowns...

IIRC, Hyper-Threading in the P4s was estimated to cause a 30% slowdown under full load. I will confess to never having verified that figure, but my general experience with it was the big gain from hyper-threading lay in the interactivity area, and not in "throughput" which is what most benchmarks test.

The core i7 is definitely NOT a P4. You could say this is hyperthreading 2.0. Seems intel got it right this time.

The virtual cores don't double your performance, but it seems now that it almost never subtracts from performance. I suspect it has to do with how many more execution units the core i7 has compared with any other x86_64 compatible cpu available today.

The atom also hyperthreads decently.

**RobbieAB** · 13 April 2009, 01:30 PM

Originally posted by bnolsen View Post

The core i7 is definitely NOT a P4. You could say this is hyperthreading 2.0. Seems intel got it right this time.

The virtual cores don't double your performance, but it seems now that it almost never subtracts from performance. I suspect it has to do with how many more execution units the core i7 has compared with any other x86_64 compatible cpu available today.

The atom also hyperthreads decently.

I am aware of the fact that i7 is not P4.

However, the basic point still stands: Hyperthreading increases complexity in managing multiple processes. 99% of the time it doesn't matter, but I would be very surprised if the heavily tuned science apps don't show slowdown when hyper-threading is enabled, simply because most of them are tuned to saturate the CPU, with minimal breaks in the pipeline, a class of optimisation I have NEVER heard discussed since I left Uni, mostly because outside of that specialised field people don't bother with it much. These are the kind of apps where a P4 @ 3.6GHz could actually compete with the early C2Ds simply because they COULD avoid the pipeline breaks that killed the P4 in general usage.

Hyper-threading is about interactivity, not throughput. It sacrifices complexity to try and drop latency. For desktops it will often manifest as a throughput boost, but that is related to the fact that desktop systems just aren't tuned that heavily, and frequently hit pipeline breaks, and not due to any kind of hyperthreading magic.

Core i7 is killing pretty much all the competition because it is a good core, designed to do a job, and running a generation ahead of it's competitors, who also seem to have decided that pushing the CPU itself is a mugs game: Platform performance is where it is at, not raw CPU power.

**Kano** · 13 April 2009, 01:50 PM

Well i7 is not for mass market, i5 will challenge AMD much more...

**bnolsen** · 13 April 2009, 05:20 PM

Originally posted by RobbieAB View Post

I am aware of the fact that i7 is not P4.

However, the basic point still stands: Hyperthreading increases complexity in managing multiple processes. 99% of the time it doesn't matter, but I would be very surprised if the heavily tuned science apps don't show slowdown when hyper-threading is enabled, simply because most of them are tuned to saturate the CPU, with minimal breaks in the pipeline, a class of optimisation I have NEVER heard discussed since I left Uni, mostly because outside of that specialised field people don't bother with it much. These are the kind of apps where a P4 @ 3.6GHz could actually compete with the early C2Ds simply because they COULD avoid the pipeline breaks that killed the P4 in general usage.

I will tell you if this is true or not later this week. I happen to work on a highly threaded high throughput aerial/close range photogrammetry package that does auto feature matching, photogrammetric bundle adjustment, radiometric computations and ortho rectification using both line scanner and frame imagery. A customer is building a dual quad core i7 with 24G of ram that I'll run some tests on.

**RobbieAB** · 13 April 2009, 05:29 PM

Originally posted by bnolsen View Post

I will tell you if this is true or not later this week. I happen to work on a highly threaded high throughput aerial/close range photogrammetry package that does auto feature matching, photogrammetric bundle adjustment, radiometric computations and ortho rectification using both line scanner and frame imagery. A customer is building a dual quad core i7 with 24G of ram that I'll run some tests on.

And I will withhold judgement until I've seen the code.

How much of that task is logic (IF statements) and how much is straight computation? How many instructions can be run without the output of the last one? It sounds a nice "hard" problem, but the actual structure of the algorithms are crucial for this kind of work.

As I said, the only people I have heard playing those games tend to be doing things like self consistent field solutions to Schrodingers Equation for many body systems, the kind of task that uses large clusters, and ties them up for weeks at a go. The kind of task that uses Fortran for critical numeric sections, because it is easier for the compiler to optimise than C (yes, this IS true, just mostly irrelevant for 99% of the worlds usage).

**bnolsen** · 13 April 2009, 05:33 PM

The code is very branchy and has lots of input IO.
Here's a screenshot of the program running on windows multicore.
This is just after the application started so the "packets" are still somewhat synchronized at this point and fighting over resources.

http://pushbroom.org/windows.gif

**RobbieAB** · 13 April 2009, 05:38 PM

Originally posted by bnolsen View Post

The code is very branchy and has lots of input IO.
Here's a screenshot of the program running on windows multicore.
This is just after the application started so the "packets" are still somewhat synchronized at this point and fighting over resources.

http://pushbroom.org/windows.gif

Um... branchy code means LOTS of pipeline breaks, which is the kind of task where hyper-threading will help.

I was thinking in terms of 10,000+ square matrix mashing as the kind of task which avoids pipeline breaks. As I said, very much science cluster work.

Announcement

Intel Core i7 On Linux

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment