Announcement

**Holograph** · 03 March 2017, 05:46 PM

Originally posted by anth View Post

According to an analysis of Ryzen cache by hardware.fr (which Chrome translates well): latency is very poor when a core from one of the two four-core-clusters accesses something in the L3 cache of the other.

https://twitter.com/AIDA64_Official/...882276866?s=09

is this analysis done with a patched version?

**pal666** · 03 March 2017, 09:30 PM

Originally posted by zboson View Post

The most disappointing thing about Zen is that it only has two 128-bit FMA units.

vega has enough fma units for you

**wizard69** · 03 March 2017, 10:42 PM

Originally posted by edwaleni View Post

LOL. I pulled the highlighted line from the Himeno website. I was trying to see if compiler flags would have any impact on the results (it doesn't seem so).

It appears to be more of optimization AMD has to do in Ryzen's cache.

Based on the results PTS has shown, Ryzen definitely is a work in progress. Somethings seem to have gotten a great deal of attention, other areas less so. Ryzen2/Ryzen Server will probably handle Himeno much better in relative terms.

This may be so, frankly it is typical on any CPU design. Effectively all CPUs are a works in progress. However in this case i really think AMD accomplished most of its goals. We are getting very good performance out of the box on most existing software. That is a good thing for most of us and frankly there is nothing to be disappointed about here.

What will be interesting is the impact new compilers and even possibly new Linux vesions have on performance. Micheal is already investigating this some but im really interested in what if anything compilers can do for Ryzen a year from now when support should be firmed up.

**qsmcomp** · 03 March 2017, 11:21 PM

Originally posted by zboson View Post

Why `-mno-rdrnd`?

I don't know about the Zen architecture but with the bulldozer architecture -mvzeroupper is not necessary. It's only Intel that suffers (maybe Zen now as well) from the false dependency on the upper half of AVX when it's dirty.

It seems that AMD's not supporting RDRND?

**qsmcomp** · 03 March 2017, 11:22 PM

Originally posted by carewolf View Post

That is one non-sensical line.I would always enable finline-function first. The rest mainly makes sense together with profiled optimization, so after you have generated a profile, you can use that profile with unroll-loops etc (In fact I believe that is default when doing profile guided optimizations second run).

I wish more build-systems had support for making profile generating and profile using builds, or could do both, first making one, then running a bunch of tests and benchmark and then compile with the generated profile.

Aggressive inlining will make generated code larger and might do harm to caching / branch predicting?

**indepe** · 03 March 2017, 11:43 PM

Originally posted by qsmcomp View Post

Aggressive inlining will make generated code larger and might do harm to caching / branch predicting?

I believe you are talking to someone who knows of that possibility, but has made the experience that it is more likely to be an improvement. That's why he says "first".

**qsmcomp** · 03 March 2017, 11:46 PM

Originally posted by indepe View Post

I believe you are talking to someone who knows of that possibility, but has made the experience that it is more likely to be an improvement. That's why he says "first".

In fact I'm using my experience compiling code for a router. So that “negative optimization” might be false for a mainstream desktop processor.

**Mark Rose** · 04 March 2017, 12:32 AM

Originally posted by anth View Post

According to an analysis of Ryzen cache by hardware.fr (which Chrome translates well): latency is very poor when a core from one of the two four-core-clusters accesses something in the L3 cache of the other.

I suspected that would be an issue. So it's basically a NUMA issue, and the OS schedulers aren't yet aware how to best place threads on Zen.

**anth** · 04 March 2017, 01:17 AM

Originally posted by Holograph View Post

https://twitter.com/AIDA64_Official/...882276866?s=09

is this analysis done with a patched version?

The page I'd linked to explains that they used software written by the authors of AIDA64 and which will be integrated into future version of that. It also said that AMD had told them that bandwidth between the two clusters was 22GB/s, compared with at least 175GB/s within each.

**liam** · 04 March 2017, 04:31 AM

Originally posted by zboson View Post

Zen is also dual channel if I recall whereas Skylake (not sure which was first) is quad-channel. This means Zen is more affected by memory bandwidth. That's maybe the second most disappointing thing about Zen after sticking with AVX128. I'm still likely to build a Zen system. It will be the first desktop I have build in years.

Why do you need more than two channels in a single socket, non-rdimm system?

Announcement

The Impact Of GCC Zen Compiler Tuning On AMD Ryzen Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment