Announcement

**edwaleni** · 20 July 2017, 12:22 AM

Originally posted by -MacNuke- View Post

They use PowerPC. It's an older ISA. Also the CPU is just a tuned up version of the PowerPC 750 from 1997. Also used by Apple with the name "G3 Processor".

Not to get to picky here, but the last POWER used in a gaming console was its Cell implementation in the XBOX 360 and PS3 Slim. It was also used in several Bandai developed arcade games.

While IBM has pretty much moved POWER based Cell development to the backside, a cluster of PS3's would have made a great cryptocurrency miner in its time, due to its focus on power management. Development effectively ended in 2008, the last POWER based Cell product was discontinued in 2011.

It was original thinking in CPU terms, but IBM could not get the Linux community to embrace it. It was very difficult to develop for.

**kylew77** · 20 July 2017, 12:55 AM

Originally posted by eigenlambda View Post

X86 smt is about the cores being too wide for one thread to use all of. 4 or 8 is probably about cache behavior, a gamer with smt4 would be annoyed about threads being resource starved on the processor

So the only reason Intel put HT on the P4 and subsequent chips was the pipeline was too long then? So if I'm getting this if we were to do SMT4 on x86 we would have to have more massive caches than currently which would drive up the prices a lot because it is sram? Is the tread resource behaviour reason the same reason some of those Xeons come with like 30 MB of cache, just to make sure the treads aren't starved?

**ravyne** · 20 July 2017, 01:22 AM

SMT2 is about making use of idle execution units when not enough data-independent instructions can be pulled from the instruction window to saturate the entire CPU core - the reason that these "extra" execution resources exist is either that they are necessary but (relatively) infrequently accessed by a single thread, or because duplicating the execution resources is commonly beneficial to single threads (eg ALUs and AGUs); for the most part, engineers aren't adding execution resources for the benefit of SMT performance gains, specifically.

SMT4/8 isn't so much about a lack of data-instruction-level parallelism -- though these POWER cores do tend to be wider -- as I understand it, the missed opportunities there tend to be the (relative) lack of data entirely, due to memory latency (hence also the enormous caches to also help compensate)/wild access patterns, or being IO bound. This is why POWER and other SMT-heavy CPUs tend to dominate XEON in applications like databases -- the CPU time needed is peanuts against the time spent waiting on memory fetches that jump all over a huge address space, and then might have to hit disk to top it off -- the CPU itself spends so much time idling that it can soak up 4 or 8 threads.

The neat thing about POWER, as I understand, is that the CPU isn't always running at SMT4/8, but that it can effectively be running in SMT2 when a core is running 2 heavy threads (it basically never makes sense to disable SMT entirely, unless maybe you had two really hot threads oversubscribing the cache).

Someone mentioned Knights Landing being SMT4, and that's technically true, but the key detail is that each core has 2 full-blown AVX512 execution units, and the workloads are all heavily AVX512 of course -- the bet there is that 4 threads are what's needed to keep those dual AVX units busy, but the amount of scalar code needed to support the AVX code is relatively little, so one set of scalar execution units is sufficient. It's a bit like a hybrid, Scalar-SMT4/Vector-SMT2 core.

**jacob** · 20 July 2017, 01:46 AM

Originally posted by Brane215 View Post

NIce, but they'll have to show why exactly would one go for such a thing instead of classic AMD/Intel x86 portfolio.

Security.

Intel and AMD are blob-laden black boxes with proprietary (mis)features, proprietary microcode and proprietary firmware. What makes me crave for a Raptor-style machine is not so much the performance (yes, it will be powerful, but you can get the same level of performance from a Xeon-based system for much less $$$), it's simply that it's a trustworthy system. It has probably not much to offer to your average web developer, yet alone gamer (since games won't run on it at all), but for anyone who works as a penetration tester, a developer of highly specialised and sensitive software, a translator of confidential documents etc. it certainly looks like an attractive proposition - provided that the price turns out to be reasonable (not necessarily "low", just sensible).

**kylew77** · 20 July 2017, 03:36 AM

Originally posted by ravyne View Post

SMT2 is about making use of idle execution units when not enough data-independent instructions can be pulled from the instruction window to saturate the entire CPU core - the reason that these "extra" execution resources exist is either that they are necessary but (relatively) infrequently accessed by a single thread, or because duplicating the execution resources is commonly beneficial to single threads (eg ALUs and AGUs); for the most part, engineers aren't adding execution resources for the benefit of SMT performance gains, specifically.

SMT4/8 isn't so much about a lack of data-instruction-level parallelism -- though these POWER cores do tend to be wider -- as I understand it, the missed opportunities there tend to be the (relative) lack of data entirely, due to memory latency (hence also the enormous caches to also help compensate)/wild access patterns, or being IO bound. This is why POWER and other SMT-heavy CPUs tend to dominate XEON in applications like databases -- the CPU time needed is peanuts against the time spent waiting on memory fetches that jump all over a huge address space, and then might have to hit disk to top it off -- the CPU itself spends so much time idling that it can soak up 4 or 8 threads.

The neat thing about POWER, as I understand, is that the CPU isn't always running at SMT4/8, but that it can effectively be running in SMT2 when a core is running 2 heavy threads (it basically never makes sense to disable SMT entirely, unless maybe you had two really hot threads oversubscribing the cache).

Someone mentioned Knights Landing being SMT4, and that's technically true, but the key detail is that each core has 2 full-blown AVX512 execution units, and the workloads are all heavily AVX512 of course -- the bet there is that 4 threads are what's needed to keep those dual AVX units busy, but the amount of scalar code needed to support the AVX code is relatively little, so one set of scalar execution units is sufficient. It's a bit like a hybrid, Scalar-SMT4/Vector-SMT2 core.

I learned more reading that post than in a university intro computer organization course. Sounds like to me that why x86 has SMT2 then wasn't really by design but an extra that arose due to the consequences of design then.

**kylew77** · 20 July 2017, 03:50 AM

I wonder what we can expect from the integrated graphics on this board. Integrated 2d graphics like back in the old off die days that you found on early 2000's boards with 8 MB of video ram that barely play web videos today or more like modern Intel integrated graphics that are good enough for anything short of serious gaming?

**BillBroadley** · 20 July 2017, 05:12 AM

Originally posted by kylew77 View Post

How come we are starting to hear about these higher thread models of SMT from SPARC64 and POWER 8 and 9, but not from Intel or AMD? Would it be possible to make an x86 chip with say SMT4 so you take say an 8 core Ryzen and each core has 4 threads for 32 virtual processing cores? Not enough demand? Never see one in action but the Xeon Phi has SMT4 and isn't it supposed to be x86 architecture, why couldn't it trickle down to Xeon E5s or i7s or something?

You are describing the xeon phi, 60+ cores, 4 threads each.

**Dawn** · 20 July 2017, 09:02 AM

Originally posted by kylew77 View Post

I was under the impression that there was going to be SMT4 and SMT8 variations so each processor has either 12 or 24 physical cores and with SMT8 you would have 48 or 96 virtual cores for the 12 core physical core SKU and 96 or 192 virtual cores on the 24 physical core SKU. That would mean for a duel socket with the 12 core part you have either 96 or 192 threads and for a duel socket with the 24 core part you have either 192 or 384 threads.

Nope. 24-core is always SMT4. The SMT8 core is basically two SMT4 cores fused together behind a shared L1 cache, and physically the 12c SMT8 chip is very similar to the 24c SMT4 chip. The primary reason for the SMT8 core's existence is for licensing: higher per-core performance = lower costs for per-core licensed software. Socket-level throughput should be similar or identical.

**jhenke** · 20 July 2017, 11:08 AM

Also keep in mind that POWER is a RISC machine, x86 is a CISC. With microcode etc the distinction is blurred these days, but it still makes for some significant changes in core design.

**ravyne** · 20 July 2017, 12:44 PM

Originally posted by kylew77 View Post

I learned more reading that post than in a university intro computer organization course. Sounds like to me that why x86 has SMT2 then wasn't really by design but an extra that arose due to the consequences of design then.

SMT2 is a clever way to put idle resources to use, and a no-brainer to implement once the core reaches a certain width. SMT4+ really does tend to be more of a conciousness design decision, based on intended workloads.

I should also throw in another interesting example -- in the XBox360 and PS3, the PowerPC-based CPU isn't especially wide, but it still has SMT2. The reason for that is because those cores don't have out-of-order execution, which means that the CPU can't skip past an instruction that relies on the results of a previous instruction to find more work to do. So in these cores, there's two in-order execution threads. Ultimately the reason is the same, using resources that would otherwise go idle, but the example is interesting because SMT isn't necessarily about big, wide cores in absolute terms, it's about how "wide" it is relative to how much work a single thread of execution can feed it. On the Intel side, all but the last couple generations of the Atom cores are also in-order execution, and some of the in-order cores had SMT2 for the exact same reason.

As an interesting aside, each SMT thread in the XBox360 had a dedicated vector unit (like SSE4, but in this case a custom, extended Altivec design), so it was only scalar instructions competing for execution resources -- like Knights Landing, it's kind of a hybrid Scalar-SMT2/Vector-dedicated core. I don't believe the PS4 had either the second Altivec unit, nor the same custom extensions -- the had SPUs instead (whole dedicated streaming vector processors) that in perfect use are hugely powerful -- the extra vector units in the XBox360, combined with smoother, more capable GPU compute, are really how the 360 stayed technically competitive.

Announcement

Raptor Is Going To Launch A New POWER9 Linux System

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment