AMD Releases FX-Series Bulldozer Desktop CPUs

sunweb replied

22 October 2011, 03:20 AM
Compiler will help with multithreaded performance
Leave a comment:
Hephasteus replied

18 October 2011, 09:14 PM
Originally posted by smitty3268 View Post

power requirements,

Bingo was his namo.
Leave a comment:
PsynoKhi0 replied

17 October 2011, 12:14 PM
Originally posted by Kivada View Post

I'd love to support VIA, but I run Linux and they seem to be too retarded to realize their own position and should have merged with Nvidia long ago for the benefit of both companies. Its one of the few times you'll ever hear me support a merger, because I liked what was demoed with the VIA Nano CPU with the Nvidia ION chipset and Nvidia would have had access to their X86 license and stable of ARM devs. Though I still don't like that Nvidia wont release docs to the Nouveau team, I'll take OSS drivers over blobs any day of the week.

I'd be happy with VIA not doing a 180 on us after promising full OSS IGP drivers.
Their dual-core Nano is actually pretty good, and chrome IGPs are more than enough for entry-level parts. The openchrome drivers offer better video playback than VIA's own "compile 6-months late stuff yourself" code. The only things the latter have going for themselves is 3D hardware acceleration and better management of external display. Although setting it up is a major PitA.
Oh and they've had on-die AES support since at least the C7 days (I should know, I own an HP2133), although people seem to think Intel got there first.

Back to OT, again I think people should stop crying murder and give BD a chance. It's off a disappointing (to put it mildly) start since initial tests have made it look like far from an clear-cut upgrade from previous AMD CPUs. I still think the basis for the architecture is sound and it's gonna turn out to be a really fine option real soon.

That, and the people who put together the press kit for the launch should be shot. Not fired, shot. Right between the eyes. Though I could settle for "neutered".
Leave a comment:
curaga replied

17 October 2011, 11:47 AM
Coming from a noob, shouldn't it be a positive effect when the whole binary fits to the cache?
Leave a comment:
smitty3268 replied

17 October 2011, 05:14 AM
Originally posted by Kivada View Post

Maybe it's because you forget that loading from the CPU cache is faster then loading from system ram? If I remember right since it's been a few years, if all other things are equal doubling the amount of CPU cache should improve performance by 3-5% across the board, so those of you claiming that it might perform better without the L3 cache are basing your arguments on what exactly?

Sigh.

Cache is only faster if it's being used. Adding more cache doesn't automatically add 3-5% across the board. That's an estimate, because most apps used to use the extra cache about that much. As caches grow larger, the number of apps that can take advantage of it but didn't previously will decrease. And note that rule of thumb is talking about doubling the amount of cache (in the same level) which means there is no additional latency involved that wasn't already there. In this case, you are adding an entire additional level of cache, increasing latencies, which means that some of the speedup is then offset by that slowdown.

Some apps will get no gain at all while others will see massive gains. It all depends on their memory usage patterns. If they are randomly accessing memory, no amount of cache will help because nothing is cached the first (and last) time you access it. On the other hand, if you have an app that really needs to frequently access < 8MB of data but > 1MB, this extra cache is going to be a huge help. Lots of server apps fall under that category. Not many desktop apps. L3 cache is also used in modern designs to directly share data across multiple cores - which again is more relevant to server workloads than a desktop one.

I'm not saying the extra L3 cache is actually hurting anything in these tests. I'm just saying it might be. At least in some of them - I'm sure it's helping in a few as well.

I know a lot of games, for example, are extremely dependent on memory access timings. They loved the fast L2 speeds the old Pentium M and later Core processors had, and didn't particularly take advantage of the L3 cache Phenoms added.

AMD has already said they are planning to release a Bulldozer chip with the L3 cache removed or reduced for the consumer market, so they know all of this far better than you or i do. I'm not sure when it will come out, but my guess is that all non FX series will probably have it removed.

As before, these weren't designed for the desktop market, because CPU performance on the desktop as been "good enough" for several years now, servers are where the money is these days, so theres point in designing a CPU specifically for the consumer market when your server CPU will work just fine for the task, thus why bother pouring time and money into chasing a stagnant market where the difference in perceived performance by the end user will essentially be identical to those using a machine from 2006?

Like i said, this chip was designed for the server market. That comes with both positive and negatives to it.

Seriously, put any of your non tech relatives in front of a machine with a Core2 or Athlon2 system and an i5 or i7, can they tell the difference, especially if both machines have identical graphics drivers and amounts of ram? My guess would be no, they can't tell the machines apart, the seconds saved off by the i7 in their day to day tasks would be completely unnoticed. Welcome to 95% of the computing market. Seriously, they are either suffering with a terrible GPU and no SSD, having those would have a more noticeable impact then a faster CPU.

I never even made this argument, so i'm not sure why you're trying to argue a point that i agree with.

Last edited by smitty3268; 17 October 2011, 05:31 AM.
Leave a comment:
ChrisXY replied

17 October 2011, 04:22 AM
Originally posted by smitty3268 View Post

The point is that AMD doesn't feel comfortable shipping CPUs at that speed, whether it's due to manufacturing issues, power requirements, or whatever.

Not?

- YouTube

http://www.youtube.com/watch?v=8rDwXuAINJk

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

See AMD FX Processors with stunning performance go head to head in gaming, multi-tasking, video processing, and image processing. This video shows AMD FX processors against a number of competitive processors. Unlocked. Unrivaled. Unbelievable.
Leave a comment:
Kivada replied

17 October 2011, 02:40 AM
Originally posted by smitty3268 View Post

Pretty sure i know more than you do. I've seen the overclocking results, but Intel chips can OC as well. The point is that AMD doesn't feel comfortable shipping CPUs at that speed, whether it's due to manufacturing issues, power requirements, or whatever. I can guarantee you they aren't specifically crippling their hardware just for the fun of it. And what does MB cost have to do with this discussion? I've already stated Bulldozer is fairly cheap.

As far as the cache situation - cache isn't bad because there's so much of it (obviously - can't believe i have to make this point). It's bad because it adds to latency when you have to access memory outside the cache. If the latency of the L3 cache is 30ns, that automatically adds 30ns each and every time the application needs to grab data from the system memory. L3 cache is great for server applications that tend to access the same code over and over again. For desktop apps, many of them tend to have random access patterns that are unable to utilize the L3 cache very well, and latency ends up being much more important than capacity. Intel's cache latencies are noticeably faster than Bulldozer, by the way.

Maybe it's because you forget that loading from the CPU cache is faster then loading from system ram? If I remember right since it's been a few years, if all other things are equal doubling the amount of CPU cache should improve performance by 3-5% across the board, so those of you claiming that it might perform better without the L3 cache are basing your arguments on what exactly?

As before, these weren't designed for the desktop market, because CPU performance on the desktop as been "good enough" for several years now, servers are where the money is these days, so theres point in designing a CPU specifically for the consumer market when your server CPU will work just fine for the task, thus why bother pouring time and money into chasing a stagnant market where the difference in perceived performance by the end user will essentially be identical to those using a machine from 2006?

Seriously, put any of your non tech relatives in front of a machine with a Core2 or Athlon2 system and an i5 or i7, can they tell the difference, especially if both machines have identical graphics drivers and amounts of ram? My guess would be no, they can't tell the machines apart, the seconds saved off by the i7 in their day to day tasks would be completely unnoticed. Welcome to 95% of the computing market. Seriously, they are either suffering with a terrible GPU and no SSD, having those would have a more noticeable impact then a faster CPU.
Leave a comment:
smitty3268 replied

15 October 2011, 06:40 PM
Originally posted by Kivada View Post

Do you even know how CPU cache works? Do you realize that Intel has put as much as 12Mb of cache on their Core series CPUs? The chips are fully capable of that 4.5Ghz, just the same as Intel's, via speed step detecting that you are running a heavy single threaded app and nothing else it will up clock one module to that range. Though you can just put on high end cooling and push as much as 4.9Ghz on really good air or on a self contained liquid kit in a box. Though you can likely do much better with a DIY liquid kit that is overbuilt to the point you never go more then 5c over ambivalent temperatures.

Intel mobos are still generally more expensive then AMD boards, add to the fact that if your AM3 board has a BIOS update too support these it'll be a drop in upgrade from anything in the Phenom2 line.

Go google it before repeating canned crap.

Pretty sure i know more than you do. I've seen the overclocking results, but Intel chips can OC as well. The point is that AMD doesn't feel comfortable shipping CPUs at that speed, whether it's due to manufacturing issues, power requirements, or whatever. I can guarantee you they aren't specifically crippling their hardware just for the fun of it. And what does MB cost have to do with this discussion? I've already stated Bulldozer is fairly cheap.

As far as the cache situation - cache isn't bad because there's so much of it (obviously - can't believe i have to make this point). It's bad because it adds to latency when you have to access memory outside the cache. If the latency of the L3 cache is 30ns, that automatically adds 30ns each and every time the application needs to grab data from the system memory. L3 cache is great for server applications that tend to access the same code over and over again. For desktop apps, many of them tend to have random access patterns that are unable to utilize the L3 cache very well, and latency ends up being much more important than capacity. Intel's cache latencies are noticeably faster than Bulldozer, by the way.
Leave a comment:
Hephasteus replied

15 October 2011, 10:51 AM
Originally posted by Dresdenboy View Post

Uh, I see. This has been linked one page before.

Anyway - recompiled code seems to show a significant advantage for BD. Same in HT4U's cray test (German).

BD has strong rules for instruction grouping, which might lower decode bandwidth to 1-2 inst/cycle, thus not only limiting performance of one thread but indirectly reducing decode throughput of the second thread.

So far if AMD includes their "Branch Redirect Recovery Cache" (a ?op buffer) in Trinity, this might even help legacy code after they surpassed the decode stage bottleneck.

If 4 core cpu's came out in 2003 they would have flopped as bad. The problem is 8 cores is not twice as hard to get out of lockdown. It's orders of magnitude harder. It'll eventually get better and bulldozer could probably run 70 percent faster than 4 cores but most likely only 50 percent faster. Everyone on this thread will be dead before 16 core desktops actually work well in general computing environments.
Look at gpu's. As the core counts go up on them. They keep simultaneously forcing higher and higher resolution screens. Because they can't improve performance much on normal resolution screens but they can give the gpu's more to do on each frame.

Computer manufactures are stuck in 2 strategies. Doing what is feasible and workable and hoping it's accepted by buyers. Or working towards nearly unachievable goals and simply lying about progress as they go with customers buying and giving a son i'm dissapoint reaction at every stage of it. So it's either 2560x1600 3d screens that give you headaches or nihilism and converting your early adopter groups into wait and see groups over time.

The problem is they went many cores to get around the clock speed problem. Now they need clock speed to get around the many cores problem because only speed will bring the busses out of lock out faster. So you need what you can't get to fix what you got but can't get.

- YouTube

http://www.youtube.com/watch?v=FTeWGD4Q9T4

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.
Leave a comment:
Dresdenboy replied

14 October 2011, 06:05 AM
Originally posted by Dresdenboy View Post

There is one French review of the FX where they used the Phoronix Test Suite (PTS in their table):
http://www.pcinpact.com/articles/amd-fx-8150/420-5.htm

Uh, I see. This has been linked one page before.

Anyway - recompiled code seems to show a significant advantage for BD. Same in HT4U's cray test (German).

BD has strong rules for instruction grouping, which might lower decode bandwidth to 1-2 inst/cycle, thus not only limiting performance of one thread but indirectly reducing decode throughput of the second thread.

So far if AMD includes their "Branch Redirect Recovery Cache" (a ?op buffer) in Trinity, this might even help legacy code after they surpassed the decode stage bottleneck.
Leave a comment:

Announcement

AMD Releases FX-Series Bulldozer Desktop CPUs

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: