Announcement

Collapse
No announcement yet.

AMD Piledriver/Trinity A10-5800K Compiler Tuning

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Piledriver/Trinity A10-5800K Compiler Tuning

    Phoronix: AMD Piledriver/Trinity A10-5800K Compiler Tuning

    With the initial Linux results for the AMD A10-5800K Trinity APU now out of the way along with the Radeon HD 7660D graphics performance, in this article are some benchmarks looking at the impact of compiler tuning for the Piledriver cores using the common GCC compiler and testing different CPU micro-architecture targets.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Inline assembly?

    Does anyone know how many of the programs tested in the article use inline assembly? I'm not horribly familiar with any of them. That would surely taint the meaningfullness of testing different compiler switches--when the core of the code is an 'as given' assembly blob.

    Comment


    • #3
      i would like to see a -march=native flag in there. it would show if gcc is correctly detecting the CPU, but would also show if the addition information about cache size and layout helped.

      it would be good if developers tried harder to help the compiler auto vectorise, rather than putting in their own assembly. that way the code would automatically benefit on new architectures. http://locklessinc.com/articles/vectorize/ has some examples of hind that can be given.

      Comment


      • #4
        Are there any actual software that can actually positively use FMA3 ? AFAIK, scientific software can possibly use FMA3, but i havent seen any real world example.

        Comment


        • #5
          Originally posted by mayankleoboy1 View Post
          Are there any actual software that can actually positively use FMA3 ? AFAIK, scientific software can possibly use FMA3, but i havent seen any real world example.
          Any time you do A=A+B*C, you can benefit from FMA3. That's pretty common in any matrix math--which is used heavily in graphics as well. All FFTs can benefit from FMA3 as well. I wonder if any of these programs link to libraries that could benefit from FMA3. We might not really be seeing the full effect of these different compiler settings if the libraries aren't making use of it as well.

          Someone correct me if I'm wrong, but x86, SSE, and AVX all have separate registers, right? So, any code mixing SSE (say, from a library) and AVX (from the calling program) will hit a register copy penalty.

          Comment


          • #6
            As a simple user, how can I relate what I see in this benchmark to a common distribution -- e.g. the latest Ubuntu? Are the Ubuntu binaries built with any of the benchmarked CPU targets? Since I'm planning to buy an A10 5800 the exact same day when it will arrive in my town, it would be interesting to know this to understand what difference it would make if I could compile the binaries for my (future) CPU.

            Comment


            • #7
              Ubuntu binaries are compiled to run on most processors, pretty generic stuff, with generic optimisations. If you want system-wide improvements, you'll have to compile your own system, Gentoo or Arch style, but in reality, the practical gains of doing this are moderate.

              What you CAN do is compile specific software that you need to optimize, like your scientific software, or video encoder or something similar that's processor-intensive. This is really worth doing.

              Comment


              • #8
                Originally posted by geamandura View Post
                As a simple user, how can I relate what I see in this benchmark to a common distribution -- e.g. the latest Ubuntu? Are the Ubuntu binaries built with any of the benchmarked CPU targets? Since I'm planning to buy an A10 5800 the exact same day when it will arrive in my town, it would be interesting to know this to understand what difference it would make if I could compile the binaries for my (future) CPU.
                most binary distros are quite conservative with build options, so wont turn on most of these optimisations. the 64bit editions will generally run on any x86-64 CPU, so the most you can assume is SSE2. for debian based systems there is apt-build that can rebuild packages.

                if you are interested in rebuilding lots of packages, then you might want to look at gentoo (or a derivative). but be aware that gentoo stable currently has GCC 4.5, and only 4.6 in unstable. gcc 4.7 is hard masked ( http://packages.gentoo.org/package/sys-devel/gcc ).

                Comment


                • #9
                  Good news for Gentoo users I think.
                  And it is interesting to see that it help pretty much on most scenarios while some others don't seem to be influenced.
                  Stop TCPA, stupid software patents and corrupt politicians!

                  Comment


                  • #10
                    generic tuning?

                    Hello!

                    @Michael: How about comparing these results with the results of a run with -mtune=generic, which is uses in standard distributions? That way one might get a glimpse how well the bulldozer will perform there.

                    Best,

                    Olaf

                    Comment

                    Working...
                    X