Announcement

Collapse
No announcement yet.

The Impact Of GCC Zen Compiler Tuning On AMD Ryzen Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by anth View Post
    According to an analysis of Ryzen cache by hardware.fr (which Chrome translates well): latency is very poor when a core from one of the two four-core-clusters accesses something in the L3 cache of the other.
    https://twitter.com/AIDA64_Official/...882276866?s=09

    is this analysis done with a patched version?

    Comment


    • #32
      Originally posted by zboson View Post
      The most disappointing thing about Zen is that it only has two 128-bit FMA units.
      vega has enough fma units for you

      Comment


      • #33
        Originally posted by edwaleni View Post

        LOL. I pulled the highlighted line from the Himeno website. I was trying to see if compiler flags would have any impact on the results (it doesn't seem so).

        It appears to be more of optimization AMD has to do in Ryzen's cache.

        Based on the results PTS has shown, Ryzen definitely is a work in progress. Somethings seem to have gotten a great deal of attention, other areas less so. Ryzen2/Ryzen Server will probably handle Himeno much better in relative terms.
        This may be so, frankly it is typical on any CPU design. Effectively all CPUs are a works in progress. However in this case i really think AMD accomplished most of its goals. We are getting very good performance out of the box on most existing software. That is a good thing for most of us and frankly there is nothing to be disappointed about here.

        What will be interesting is the impact new compilers and even possibly new Linux vesions have on performance. Micheal is already investigating this some but im really interested in what if anything compilers can do for Ryzen a year from now when support should be firmed up.

        Comment


        • #34
          Originally posted by zboson View Post

          Why `-mno-rdrnd`?

          I don't know about the Zen architecture but with the bulldozer architecture -mvzeroupper is not necessary. It's only Intel that suffers (maybe Zen now as well) from the false dependency on the upper half of AVX when it's dirty.
          It seems that AMD's not supporting RDRND?

          Comment


          • #35
            Originally posted by carewolf View Post

            That is one non-sensical line.I would always enable finline-function first. The rest mainly makes sense together with profiled optimization, so after you have generated a profile, you can use that profile with unroll-loops etc (In fact I believe that is default when doing profile guided optimizations second run).

            I wish more build-systems had support for making profile generating and profile using builds, or could do both, first making one, then running a bunch of tests and benchmark and then compile with the generated profile.
            Aggressive inlining will make generated code larger and might do harm to caching / branch predicting?

            Comment


            • #36
              Originally posted by qsmcomp View Post

              Aggressive inlining will make generated code larger and might do harm to caching / branch predicting?
              I believe you are talking to someone who knows of that possibility, but has made the experience that it is more likely to be an improvement. That's why he says "first".

              Comment


              • #37
                Originally posted by indepe View Post

                I believe you are talking to someone who knows of that possibility, but has made the experience that it is more likely to be an improvement. That's why he says "first".
                In fact I'm using my experience compiling code for a router. So that “negative optimization” might be false for a mainstream desktop processor.

                Comment


                • #38
                  Originally posted by anth View Post
                  According to an analysis of Ryzen cache by hardware.fr (which Chrome translates well): latency is very poor when a core from one of the two four-core-clusters accesses something in the L3 cache of the other.
                  I suspected that would be an issue. So it's basically a NUMA issue, and the OS schedulers aren't yet aware how to best place threads on Zen.

                  Comment


                  • #39
                    Originally posted by Holograph View Post

                    https://twitter.com/AIDA64_Official/...882276866?s=09

                    is this analysis done with a patched version?
                    The page I'd linked to explains that they used software written by the authors of AIDA64 and which will be integrated into future version of that. It also said that AMD had told them that bandwidth between the two clusters was 22GB/s, compared with at least 175GB/s within each.

                    Comment


                    • #40
                      Originally posted by zboson View Post
                      Zen is also dual channel if I recall whereas Skylake (not sure which was first) is quad-channel. This means Zen is more affected by memory bandwidth. That's maybe the second most disappointing thing about Zen after sticking with AVX128. I'm still likely to build a Zen system. It will be the first desktop I have build in years.

                      Why do you need more than two channels in a single socket, non-rdimm system?

                      Comment

                      Working...
                      X