Multi-Core Scaling Performance Of AMD's Bulldozer

nepwk replied

26 October 2011, 07:08 AM
Good article. You did a great job of showing the difference between 8 semi-real cores vs. hyperthreading.
Leave a comment:
AnonymousCoward replied

26 October 2011, 06:57 AM
Originally posted by elanthis View Post

You don't generally _want_ data to be shared betweens threads. That would just mean your threading architecture is all wrong and being hamstrung by data dependencies/locking.

Shared caches save money. They don't improve speed. (generally speaking, of course)

So what we really want is tests with both, to see what is faster, and how the Bulldozers with less cores behave.

And both AMD's 6-core and Intel's 2600 (Edit: the 2630qm has hyperthreading, so it probably works as substitute) are really missing here for the full picture.

Last edited by AnonymousCoward; 26 October 2011, 07:07 AM.
Leave a comment:
ifkopifko replied

26 October 2011, 05:20 AM
Yes, it may be better sollution to have larger shared cache than smaller dedicated cache per core (because having "larger" dedicated cache per core is more expensive), but I thought we are talking about Bulldozer in the state as it is and which cores are better to be left enabled. Hope I cleared it up.

Last edited by ifkopifko; 26 October 2011, 05:25 AM.
Leave a comment:
RealNC replied

26 October 2011, 05:12 AM
Originally posted by ifkopifko View Post

2RealNC> It is not the case with BD modules. No matter if one per module or two per module, all of them share the whole L3 cache. If one core per module is activated, it has 2MB L2, which it would have to share with the other core otherwise.

I'm afraid I didn't understand the above.

In my thinking, it seems better to have a larger, shared cache rather than multiple smaller, non-shared ones.
Leave a comment:
ifkopifko replied

26 October 2011, 04:48 AM
Hello.

Very nice test suite, but I would propose some changes:
1) To judge the efficiency of scaling per architekture, functions like Turbo should be disabled. When enabled it is only natural that the scaling with more threads gets lower.
2) I would change the graphs so that they are easier to interpret by the looks (so that linear scaling would look linear). For example, the x-axis should be linear if the y-axis is linear. Not like you have it now, with 1 to 2 distance being the same as 2 to 4 distance. It just looks weird.

2RealNC> It is not the case with BD modules. No matter if one per module or two per module, all of them share the whole L3 cache. If one core per module is activated, it has 2MB L2, which it would have to share with the other core otherwise.
Leave a comment:
RealNC replied

26 October 2011, 04:35 AM
Originally posted by elanthis View Post

You don't generally _want_ data to be shared betweens threads. That would just mean your threading architecture is all wrong and being hamstrung by data dependencies/locking.

Shared caches save money. They don't improve speed. (generally speaking, of course)

I don't view it that way. If you're gonna have, say, 8MB cache on 4 cores, it's better to make it shared rather than 2MB per core. That way, on loads that involve fewer cores the cache increases (on a two-thread load you have 4MB per core).

But of course that view comes from someone who doesn't know the details behind CPU cache memory :-P
Leave a comment:
elanthis replied

26 October 2011, 02:59 AM
Originally posted by smitty3268 View Post

rather than separately 1 per module.

That might allow them to share cached data more efficiently between threads?

You don't generally _want_ data to be shared betweens threads. That would just mean your threading architecture is all wrong and being hamstrung by data dependencies/locking.

Shared caches save money. They don't improve speed. (generally speaking, of course)
Leave a comment:
smitty3268 replied

26 October 2011, 02:19 AM
I think it may have also been interested to enable the cores together in modules

rather than separately 1 per module.

That might allow them to share cached data more efficiently between threads?

And I think at lower core counts it could enable more aggressive turbo frequencies.

But this was an interesting test as well.
Leave a comment:
phoronix started a topic Multi-Core Scaling Performance Of AMD's Bulldozer

26 October 2011, 01:00 AM
Multi-Core Scaling Performance Of AMD's Bulldozer

Phoronix: Multi-Core Scaling Performance Of AMD's Bulldozer

There has been a lot of discussion in the past two weeks concerning AMD's new FX-Series processors and the Bulldozer architecture. In particular, with the Bulldozer architecture consisting of "modules" in which each has two x86 engines, but share much of the rest of the processing pipeline with their sibling engine; as such, the AMD FX-8150 eight-core CPU only has four modules. In this article is a look at how well the Bulldozer multi-core performance scales when toggling these different modules. The multi-core scaling performance is compared to AMD's Shanghai, Intel's Gulftown and Sandy Bridge processors.

Multi-Core Scaling Performance Of AMD's Bulldozer Review - Phoronix

http://www.phoronix.com/vr.php?view=16589

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
Tags: None

Announcement

Multi-Core Scaling Performance Of AMD's Bulldozer

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: