Announcement

**microcode** · 05 April 2018, 09:49 AM

I wonder how much of IBM's pricing for these chips is cream. They could be making a huge mistake with pricing on these, unless they want the whole software ecosystem to only consist of first-party IBM ports and committed partner ISVs (primarily SAP).

Even with the current performance numbers, I would still be willing to use it if it were less than double the cost of a similar chip or chips from Intel, and if lots of developers did, the toolchains would improve rapidly.

**Guest** · 06 April 2018, 09:06 AM

Originally posted by pavlerson View Post

POWER9 cpus are much slower than x86. It is no surprise. It was many years since POWER was faster than x86. Today x86 is much faster and cheaper.

The thing that RISC cpus has an advantage over x86 are mainly two:
1) RAS. RISC cpus such as POWER and SPARC are much more reliable than x86. For instance, you can typically swap everything during run-time, including motherboard, cpu and RAM. Some SPARC cpus can replay instructions if they notice something is wrong. RAS is extremely costly and x86 has no chance to compete in the RAS arena. x86 typically use a cheap cluster, if one fails, just replace it. Quantity before quality.

RISC typically use one large server with 16 cpus or even 32 cpus - and that server must never fail. So everything is extremely good quality and engineered well. Quality before quantity. RAS is very expensive. One single IBM P595 server with 32-cpus costed $35 million. No typo. One single server. You could buy many PCs for $35 million. (It was used for the old TPC-C benchmark - check it up).

2) Scalability. POWER and SPARC scales very well, above 16-cpus. They have scaled to 32 and 64 cpus for decades. x86 does not scale well, and is therefore for small workloads with 4-8 cpus. However, x86 cpus are getting better, so 8 socket x86 servers are capable of handling quite large workloads today. There is not much need for 32-cpu servers today. But business workloads (SAP, databases, etc) are impossible to run on a cluster, so you need a single large server with as many as 16 or 32 cpus. That is where RISC has its place; on extremely large business workloads. The SAP top benchmark all belongs to RISC cpus, they all belong to 32-cpus or so. Not a single x86 server can be seen in the SAP benchmark top. And SAP installations can be extremely expensive and are extremely lucrative, costing more than $200 million. So SAP is a very important benchmark that everybody tries to win. Who does not want to sell one single server for $35 million? Everybody who can, wants to sell such servers. Business servers are extremely profitable. HPC clusters are not, they are just a bunch of cheap PCs on a fast switch. 32 cpu cluster does not cost $35 millions.

BTW. SPARC cpus are many times faster than x86 on a vast array of benchmarks. SPARC M7 and M8 holds many world records.

Another kebbabert's bullshit. Scale up will never beat scale out in performance and scalability. Nobody sane would ever invest in slowlaris and sparc today. Scale up servers are so expensive, because it's very limited stone age technology. It's like placing nukes in one place and launching them one after another vs launching them simultaneously from many places. That's why slowlaris is dead. Scale up fits only low end business segment.

In a scale UP environment, administrators can scale just capacity. The downside here is that it becomes more likely that the existing processing and network connectivity of the head end could be overwhelmed as more and more is done with the array. In the scale out environment, capacity, compute and network connectivity are scaled equally, so performance should remain linear even as more units are added.

In many cases, the scale up environment is suitable for SMB and low end mid market and often carries a lower total price tag. Scale out, on the other hand, is a bit more expensive and carries with it some additional complexity as the solution needs to be able to continue to scale as a single whole, even as additional devices with their own compute stacks are added

**admax88** · 06 April 2018, 09:27 AM

Originally posted by Pawlerson View Post

Another kebbabert's bullshit. Scale up will never beat scale out in performance and scalability. Nobody sane would ever invest in slowlaris and sparc today. Scale up servers are so expensive, because it's very limited stone age technology. It's like placing nukes in one place and launching them one after another vs launching them simultaneously from many places. That's why slowlaris is dead. Scale up fits only low end business segment.

Scale out is actually worse in performance and scalability for many workloads. Do you ever wonder why you can't re-order your mail in gmail? It's because there's not really an efficient scale out algorithm for sorting. Sorting is orders of magnitude faster if done on shared memory on the same machine in multiple threads than if done across different machines or different processes.

So many things are just not a problem if you can run your workload on one machine.

**pavlerson** · 06 April 2018, 01:53 PM

Originally posted by admax88 View Post

Scale out is actually worse in performance and scalability for many workloads. Do you ever wonder why you can't re-order your mail in gmail? It's because there's not really an efficient scale out algorithm for sorting. Sorting is orders of magnitude faster if done on shared memory on the same machine in multiple threads than if done across different machines or different processes.

So many things are just not a problem if you can run your workload on one machine.

True. You know what you talk about. It seems you have a formal education in comp sci?

What many non software engineers dont know, is that many problems cannot be parallellized. Amdahls law puts an upper bound on how much faster it gets by adding more cpus. Many non developers, believe it is just a matter of adding more cpus, but that is impossible in many cases, for instance P-complete problems.

For instance, if you are a chef trying to do a steak and sauce based on the steak juices - then you must first cook the steak. And when the steak is done you can collect the juices to make the sauce. It is impossible to both cook the steak and make the sauce at the same time. In this case, adding more chefs does not speed up the process. So, there are many problems that cannot be solved on a scale-out cluster. Adding nodes does not help. You need one single beefier scale-up server that can handle non parallell workloads. Scale-out clusters can only run parallell workloads.

I am sure you know all this, but some people dont know this. BTW, it does not matter how many times you try to explain this to some people, because they dont knowledge in programming nor comp sci.

**nils_** · 06 April 2018, 08:18 PM

Originally posted by admax88 View Post

Scale out is actually worse in performance and scalability for many workloads. Do you ever wonder why you can't re-order your mail in gmail? It's because there's not really an efficient scale out algorithm for sorting. Sorting is orders of magnitude faster if done on shared memory on the same machine in multiple threads than if done across different machines or different processes.

Your single page view probably only uses a very few machines though, probably stored sorted on a single shard on a few servers. Really the question is are you sorting a trillion item data set or a few hundred k datasets a trillion times...

Generally were most distributed systems struggle is sorting an extremely large dataset and then only displaying a few values (think SELECT ... ORDER BY ... LIMIT).

**George99** · 07 April 2018, 06:06 AM

Originally posted by pavlerson View Post

For instance, if you are a chef trying to do a steak and sauce based on the steak juices - then you must first cook the steak. And when the steak is done you can collect the juices to make the sauce. It is impossible to both cook the steak and make the sauce at the same time.

Start with steak 0 at t0. Start with steak 1 at t1 and use the juice of steak 0. And so on. It's a perfect example of how pipelining works to parallelize workloads

**pavlerson** · 07 April 2018, 09:06 AM

Originally posted by George99 View Post

Start with steak 0 at t0. Start with steak 1 at t1 and use the juice of steak 0. And so on. It's a perfect example of how pipelining works to parallelize workloads

Maybe I was unclear, but in this case you cannot start the steak and sauce at t0. You have to wait until t1 before you can start with the sauce. So this is not parallellizable, pipelining or not. For this to be parallellizable, you would need to start both tasks at t0, which you admit is impossible.

**pavlerson** · 07 April 2018, 09:10 AM

Originally posted by nils_ View Post

Your single page view probably only uses a very few machines though, probably stored sorted on a single shard on a few servers. Really the question is are you sorting a trillion item data set or a few hundred k datasets a trillion times...

Generally were most distributed systems struggle is sorting an extremely large dataset and then only displaying a few values (think SELECT ... ORDER BY ... LIMIT).

However, there are many problems that can only be run in seriell on one single cpu. These are called P-complete problems. Can be worth a read if you are interested in parallell programming:

P-complete - Wikipedia

https://en.wikipedia.org/wiki/P-complete

**DimonX** · 25 April 2018, 05:18 AM

I'm reading this thread and do not believe my eyes, anyone noticed that Power machine has 2x less cores than Epyc and even than 2x less than Xeon? 40cores Xeon vs 16cores Power isn't looks that bad for Power. I would say more it looks embarrassing for Xeon, even Epyc looks better than Xeon if you look at specs. 8cores less plus lower frequency.

**George99** · 09 May 2018, 07:20 AM

Here are some interesting impressions of this great machine:

Unboxing the Talos II: it's here!

http://tenfourfox.blogspot.com/2018/04/unboxing-talos-ii-its-here.html

UPDATE: Read a semi-review of the Raptor Talos II ! This post is being written in TenFourFox FPR7 beta 3. More about that in a day or two,...

A semi-review of the Raptor Talos II

http://tenfourfox.blogspot.com/2018/05/a-semi-review-of-raptor-talos-ii.html

After several days of messing with firmware and a number of false starts, the Talos II is now a functioning member of the Floodgap internal ...

Announcement

POWER9 Benchmarks vs. Intel Xeon vs. AMD EPYC Performance On Debian Linux

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment