Announcement

Collapse
No announcement yet.

GCC 4.8 Release Brings Improved C++11, Optimizations

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 4.8 Release Brings Improved C++11, Optimizations

    Phoronix: GCC 4.8 Release Brings Improved C++11, Optimizations

    GCC 4.8 has been officially released today as the annual major update to the GNU Compiler Collection...

    http://www.phoronix.com/vr.php?view=MTMzMzk

  • #2
    some benchmark graphs

    Originally posted by phoronix View Post
    Phoronix: GCC 4.8 Release Brings Improved C++11, Optimizations

    GCC 4.8 has been officially released today as the annual major update to the GNU Compiler Collection...

    http://www.phoronix.com/vr.php?view=MTMzMzk

    Fenrus Linux went to the RC on March 17th; the graphs at http://linux.fenrus.org/performance/ show that there's modest increases in the cray and graphics-magick tests, and a regression in fhourstones... with everything else being basically flat.

    Comment


    • #3
      Quick question: when you say "support for new Broadwell instruction set", do you mean the compiler automatically chooses instructions from that instruction set when necessary ?

      Comment


      • #4
        "Support for the new Intel processor codename Broadwell with RDSEED, ADCX, ADOX, PREFETCHW is available through -madx, -mprfchw, -mrdseed command-line options."

        Comment


        • #5
          Originally posted by startzz View Post
          "Support for the new Intel processor codename Broadwell with RDSEED, ADCX, ADOX, PREFETCHW is available through -madx, -mprfchw, -mrdseed command-line options."
          Not exactly what I meant... What I mean is if you enable the instructions, how does the compiler decide when to use them ? Because being specialized instructions, I don't see them being used for things like conditionals and simple math operations which are the most common used operations in a program. What I really mean is that everytime I have seen code optimized for a specific processor instruction set, the coding is done in assembler, not using a c/c++ compiler...

          Comment


          • #6
            Originally posted by wargames View Post
            Not exactly what I meant... What I mean is if you enable the instructions, how does the compiler decide when to use them ? Because being specialized instructions, I don't see them being used for things like conditionals and simple math operations which are the most common used operations in a program. What I really mean is that everytime I have seen code optimized for a specific processor instruction set, the coding is done in assembler, not using a c/c++ compiler...

            In all likelihood, it does not. It would just have support for intrinsics to make it easier to use them. That is how support for many instructions is added.

            Comment


            • #7
              Originally posted by wargames View Post
              Not exactly what I meant... What I mean is if you enable the instructions, how does the compiler decide when to use them ? Because being specialized instructions, I don't see them being used for things like conditionals and simple math operations which are the most common used operations in a program. What I really mean is that everytime I have seen code optimized for a specific processor instruction set, the coding is done in assembler, not using a c/c++ compiler...
              Prefetch is pretty generic and has been supported in GCC for more than a decade and half since AMD introduced it in 3DNow!, so I fully expect GCC to be able to take advantage of that automatically.

              Comment


              • #8
                prefetch

                Originally posted by carewolf View Post
                Prefetch is pretty generic and has been supported in GCC for more than a decade and half since AMD introduced it in 3DNow!, so I fully expect GCC to be able to take advantage of that automatically.
                The Prefetch instruction (and in this case "prefetchw") is almost always a loss. Hardware nowadays has pretty aggressive prefetchers that work on the actual access pattern, and those are very effective for most cases.

                The problem is that for the cases where it's not (e.g. the ones hard to tell by a machine) are also the ones where the compiler will have a hard time adding their own prefetches. (there are some special cases in HPC and such where the human can know special things)


                it's branch prediction hints all over again in many ways, where a broad use is damage because programmers know their own program not as well as the cpu does ;-)

                Comment


                • #9
                  Originally posted by fenrus View Post
                  The Prefetch instruction (and in this case "prefetchw") is almost always a loss. Hardware nowadays has pretty aggressive prefetchers that work on the actual access pattern, and those are very effective for most cases.

                  The problem is that for the cases where it's not (e.g. the ones hard to tell by a machine) are also the ones where the compiler will have a hard time adding their own prefetches. (there are some special cases in HPC and such where the human can know special things)


                  it's branch prediction hints all over again in many ways, where a broad use is damage because programmers know their own program not as well as the cpu does ;-)
                  If you set -mtune=<architecture>, then prefetchw might actually be useful. With that said, the only people that will benefit from it are those building software themselves (i.e. Gentoo users).

                  Comment


                  • #10
                    Originally posted by ryao View Post
                    If you set -mtune=<architecture>, then prefetchw might actually be useful. With that said, the only people that will benefit from it are those building software themselves (i.e. Gentoo users).

                    eh how?
                    if you think that the compiler can insert a better prefetchw than the hardware prefetchers.. please speak up with an example...


                    (disclaimer: I work for Intel on Linux, and also have my own hobby OS that I build in my spare time... I look at compilers and compiler options a lot ;-) )

                    Comment


                    • #11
                      Originally posted by fenrus View Post
                      eh how?
                      if you think that the compiler can insert a better prefetchw than the hardware prefetchers.. please speak up with an example...


                      (disclaimer: I work for Intel on Linux, and also have my own hobby OS that I build in my spare time... I look at compilers and compiler options a lot ;-) )
                      I think it is hypothetically possible that the GCC authors might create an optimization pass that uses prefetchw in a useful way at some point in the future. However, I have no example code. If any such code existed, I imagine that Intel would use it to improve the design of their next chip. That is the fate of all such microarchitecture-specific optimizations.

                      With that said, I doubt that it is possible for anyone outside of Intel to produce such code for unreleased products without assistance from Intel in the form of either an engineering sample of the chip or extremely accurate emulation software. You likely knew that though.

                      Comment


                      • #12
                        Originally posted by ryao View Post
                        the only people that will benefit from it are those building software themselves (i.e. Gentoo users).
                        Not necessarily, you can build several code paths and switch at runtime between them.
                        Also some Ubuntu users have recognized that you can get dramatic performance increases in certain situations by rebuilding specific packages optimized for their CPU with apt-build:

                        (Gentoo is running in a VM there so the results are not 100% comparable)

                        Comment


                        • #13
                          Originally posted by chithanh View Post
                          Not necessarily, you can build several code paths and switch at runtime between them.
                          Also some Ubuntu users have recognized that you can get dramatic performance increases in certain situations by rebuilding specific packages optimized for their CPU with apt-build:

                          (Gentoo is running in a VM there so the results are not 100% comparable)

                          oh compiling with a new enough CPU is a huge gain at times (just see the graphs of my distro that I posted earlier in this thread).
                          prefetchw or prefetch are not part of that however.

                          Comment


                          • #14
                            Originally posted by chithanh View Post
                            Not necessarily, you can build several code paths and switch at runtime between them.
                            Also some Ubuntu users have recognized that you can get dramatic performance increases in certain situations by rebuilding specific packages optimized for their CPU with apt-build:

                            (Gentoo is running in a VM there so the results are not 100% comparable)
                            there is absolutely room for real performance improvements using compiler flags (see url I posted way earlier in the thread).
                            prefetch/prefetchw is not part of that however.

                            Comment


                            • #15
                              Originally posted by chithanh View Post
                              Not necessarily, you can build several code paths and switch at runtime between them.
                              As far as I know, GCC does not support that. ICC does though.

                              Originally posted by fenrus View Post
                              there is absolutely room for real performance improvements using compiler flags (see url I posted way earlier in the thread).
                              prefetch/prefetchw is not part of that however.
                              You cannot prove that. Say that I had a program that could solve NP-complete problems in polynomial time and a description of Haswell. I am certain that the two could be combined to find ways to make programs perform better on Haswell with the use of prefetchw.

                              With that said, it has been demonstrated that doing tricks with prefetching can improve performance in certain areas:

                              http://arstechnica.com/business/2012...o-collaborate/

                              Comment

                              Working...
                              X