Announcement

Collapse
No announcement yet.

Multi-Core Scaling Performance Of AMD's Bulldozer

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • smitty3268
    replied
    Originally posted by rohcQaH View Post
    In any case, a direct comparison of "4 threads across 4 modules" against "4 threads crammed into 2 modules" might be interesting to see how much Bulldozer's modules actually lose over discrete cores by sharing certain parts of the CPU pipeline.
    Yes, that's all i was getting at. Maybe scaling would be worse, or maybe better, but it would have been an interesting test to see exactly what does happen. It might even tell us something about the BD architecture.

    Leave a comment:


  • wizard69
    replied
    This isn't really that bad.

    Some of the initial performance reports where very negative but this looks very good to me for a generation one processor. There is good reason now to save a few bucks going AMD as the performance penalty isn't overwhelming

    It will be especially interesting to see more testing with different configurations of the processors. We simply don't have the experience to imply anything about cache trade offs as the architecture is so new.

    The other thing to realize is that there has likely been little in the way of Bulldozer specific optimizations in these tests. That would be switches for the compilers. Even though some OS optimization has been done that doesnt preclude any other improvements specific to Bulldozer. In the end AMD could be sitting pretty with a hardware revision and some optimized compiler technologies.

    Leave a comment:


  • rohcQaH
    replied
    In any case, a direct comparison of "4 threads across 4 modules" against "4 threads crammed into 2 modules" might be interesting to see how much Bulldozer's modules actually lose over discrete cores by sharing certain parts of the CPU pipeline. Of course this is only meaningful if they run at fixed frequencies, i.e. with turbo core and any dynamic frequency scaling disabled.

    Leave a comment:


  • nepwk
    replied
    Good article. You did a great job of showing the difference between 8 semi-real cores vs. hyperthreading.

    Leave a comment:


  • AnonymousCoward
    replied
    Originally posted by elanthis View Post
    You don't generally _want_ data to be shared betweens threads. That would just mean your threading architecture is all wrong and being hamstrung by data dependencies/locking.

    Shared caches save money. They don't improve speed. (generally speaking, of course)
    So what we really want is tests with both, to see what is faster, and how the Bulldozers with less cores behave.

    And both AMD's 6-core and Intel's 2600 (Edit: the 2630qm has hyperthreading, so it probably works as substitute) are really missing here for the full picture.
    Last edited by AnonymousCoward; 10-26-2011, 07:07 AM.

    Leave a comment:


  • ifkopifko
    replied
    Yes, it may be better sollution to have larger shared cache than smaller dedicated cache per core (because having "larger" dedicated cache per core is more expensive), but I thought we are talking about Bulldozer in the state as it is and which cores are better to be left enabled. Hope I cleared it up.
    Last edited by ifkopifko; 10-26-2011, 05:25 AM.

    Leave a comment:


  • RealNC
    replied
    Originally posted by ifkopifko View Post
    2RealNC> It is not the case with BD modules. No matter if one per module or two per module, all of them share the whole L3 cache. If one core per module is activated, it has 2MB L2, which it would have to share with the other core otherwise.
    I'm afraid I didn't understand the above.

    In my thinking, it seems better to have a larger, shared cache rather than multiple smaller, non-shared ones.

    Leave a comment:


  • ifkopifko
    replied
    Hello.

    Very nice test suite, but I would propose some changes:
    1) To judge the efficiency of scaling per architekture, functions like Turbo should be disabled. When enabled it is only natural that the scaling with more threads gets lower.
    2) I would change the graphs so that they are easier to interpret by the looks (so that linear scaling would look linear). For example, the x-axis should be linear if the y-axis is linear. Not like you have it now, with 1 to 2 distance being the same as 2 to 4 distance. It just looks weird.


    2RealNC> It is not the case with BD modules. No matter if one per module or two per module, all of them share the whole L3 cache. If one core per module is activated, it has 2MB L2, which it would have to share with the other core otherwise.

    Leave a comment:


  • RealNC
    replied
    Originally posted by elanthis View Post
    You don't generally _want_ data to be shared betweens threads. That would just mean your threading architecture is all wrong and being hamstrung by data dependencies/locking.

    Shared caches save money. They don't improve speed. (generally speaking, of course)
    I don't view it that way. If you're gonna have, say, 8MB cache on 4 cores, it's better to make it shared rather than 2MB per core. That way, on loads that involve fewer cores the cache increases (on a two-thread load you have 4MB per core).

    But of course that view comes from someone who doesn't know the details behind CPU cache memory :-P

    Leave a comment:


  • elanthis
    replied
    Originally posted by smitty3268 View Post
    rather than separately 1 per module.

    That might allow them to share cached data more efficiently between threads?
    You don't generally _want_ data to be shared betweens threads. That would just mean your threading architecture is all wrong and being hamstrung by data dependencies/locking.

    Shared caches save money. They don't improve speed. (generally speaking, of course)

    Leave a comment:

Working...
X