Announcement

Collapse
No announcement yet.

GCC 4.5 vs. 4.6 On AMD's FX-4100 Bulldozer

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Cool, perhaps he can comment on the bug in bugzilla if he gets time. It's sad to have a release that's one and a half years old in portage when they've done two of them since.

    Comment


    • #17
      New entry in the openbenchmarking database, with kernel 3.1, GCC 4.5: normalized result compared to the article's data
      Only C-Ray is tested, though apparently a 27% boost. Is Bulldozer the only CPU getting such improvement?

      Comment


      • #18
        Originally posted by PsynoKhi0 View Post
        New entry in the openbenchmarking database, with kernel 3.1, GCC 4.5: normalized result compared to the article's data
        Only C-Ray is tested, though apparently a 27% boost. Is Bulldozer the only CPU getting such improvement?
        What compiler and compiler options were used? It really does matter!

        From what I've seen until now, only Bulldozer-based CPUs get a hefty performance boost with new kernels and compilers which makes perfectly sense. It's an entirely new architecture and quite a big step from the previous one as well everything else that's been around until now.

        Most notable peculiarities of the Bulldozer module-design are the shared early pipeline stages, L1 instruction and L2 caches. Without OS kernels and compilers take into account and optimizing for this new design they will be inherently crippling the module and turning the per-module performance into closer to single rather than 1.5-2 core performance.

        Comment


        • #19
          Originally posted by pszilard View Post
          What compiler and compiler options were used? It really does matter!

          From what I've seen until now, only Bulldozer-based CPUs get a hefty performance boost with new kernels and compilers which makes perfectly sense. It's an entirely new architecture and quite a big step from the previous one as well everything else that's been around until now.

          Most notable peculiarities of the Bulldozer module-design are the shared early pipeline stages, L1 instruction and L2 caches. Without OS kernels and compilers take into account and optimizing for this new design they will be inherently crippling the module and turning the per-module performance into closer to single rather than 1.5-2 core performance.
          Column On Left > Result File Information (at bottom of column) > Click > Click System Information > cc output. e.g. http://openbenchmarking.org/system/1...GCC%204.5.2/cc among other data from that area
          Michael Larabel
          http://www.michaellarabel.com/

          Comment


          • #20
            Originally posted by Michael View Post
            Column On Left > Result File Information (at bottom of column) > Click > Click System Information > cc output. e.g. http://openbenchmarking.org/system/1...GCC%204.5.2/cc among other data from that area
            Thanks, still haven't taken the time to figure out the OpenBenchmarking.com interface, it might be only me, but I find it a little confusing...

            Btw, the CC info lines (most notably the "Configured with" line) are not wrapped which makes it unreadable unless copy-pasted out...

            Comment


            • #21
              Originally posted by pszilard View Post
              Thanks, still haven't taken the time to figure out the OpenBenchmarking.com interface, it might be only me, but I find it a little confusing...

              Btw, the CC info lines (most notably the "Configured with" line) are not wrapped which makes it unreadable unless copy-pasted out...
              I always welcome Openbenchmarking.org feedback, particularly for user-interface design things as that's not my area expertise.

              The lines should wrap now.
              Michael Larabel
              http://www.michaellarabel.com/

              Comment


              • #22
                does GCC 4.6 optimise for FMA4 and XOP.

                FMA can fuse a multiply and an add into a single step, and so could give a big speed up to some code.

                also wouldn't it be great if GCC had a 'fastest flags that don't break the test-suite' option.

                Comment


                • #23
                  Originally posted by ssam View Post
                  does GCC 4.6 optimise for FMA4 and XOP.

                  FMA can fuse a multiply and an add into a single step, and so could give a big speed up to some code.

                  also wouldn't it be great if GCC had a 'fastest flags that don't break the test-suite' option.
                  Gcc 4.5 already supports FMA4 and XOP, though I'm not sure how well they get used without instrinsics. In any case FMA won't get used without -Ofast/-ffast-math since it changes the results compared to fmul+fadd.

                  Comment


                  • #24
                    Originally posted by Azpegath View Post
                    He should be running the latest VDrift (2011-09) instead of the one from 2010-06, perhaps that is a lot better. I've taken the old GLSLValidator from 3DLabs and updated it to compile on a "modern" Linux distribution with wxWidgets 2.8. I was planning on running their shaders through that since it only supports up to GLSL1.2. I tested some of their shaders and some of those fragment shaders failed.
                    The code/program can be found at https://github.com/AzP/GLSL-Validate/
                    If anyone's interested in seeing how Mesa parses shaders, I had ported Aras' Mesa glsl optimizer back to linux in the beginning of the summer. It's not only a syntax check, but also outputting the common optimizations, I found it useful in shader development.
                    You may need to get the code from the linux merge, I haven't tested if master still runs on linux, and Aras is mostly win/mac.

                    https://github.com/aras-p/glsl-optimizer

                    Comment


                    • #25
                      Originally posted by Otus View Post
                      Gcc 4.5 already supports FMA4 and XOP, though I'm not sure how well they get used without instrinsics. In any case FMA won't get used without -Ofast/-ffast-math since it changes the results compared to fmul+fadd.
                      thanks.

                      FMA should improve the results of calculations though, as the rounding is delayed. (I know sometimes one prefers worse precision if there is better consistency across different architectures)

                      Comment


                      • #26
                        Originally posted by ssam View Post
                        does GCC 4.6 optimise for FMA4 and XOP.
                        Ummm, if you google for for "-march=bdver1" you'll find all the information you need! Most important piece of info is this: http://goo.gl/LXkBr.

                        Comment


                        • #27
                          gcc is far from ideal for Bulldozer

                          Originally posted by Otus View Post
                          Gcc 4.5 already supports FMA4 and XOP, though I'm not sure how well they get used without instrinsics. In any case FMA won't get used without -Ofast/-ffast-math since it changes the results compared to fmul+fadd.
                          Nope! I'd share a document with you, but I don't want to get my butt kicked, so I found some public info on the home page of the Swiss National Supercomputing Center (they are just getting their Interlagos-based Cray XMT shipped ) which covers the most important stuff:
                          http://user.cscs.ch/news/2011/10/17/...-xop-and-fma4/

                          Comment


                          • #28
                            Originally posted by Michael View Post
                            I always welcome Openbenchmarking.org feedback, particularly for user-interface design things as that's not my area expertise.

                            The lines should wrap now.
                            Excellent!

                            Another minor issue, reproducible with both Chromium and Google Chrome version 14.0.835.202, see screenshot: http://dl.dropbox.com/u/239841/openb...hromium_14.png

                            Comment


                            • #29
                              Originally posted by pszilard View Post
                              Nope! I'd share a document with you, but I don't want to get my butt kicked, so I found some public info on the home page of the Swiss National Supercomputing Center (they are just getting their Interlagos-based Cray XMT shipped ) which covers the most important stuff:
                              http://user.cscs.ch/news/2011/10/17/...-xop-and-fma4/
                              I remember testing XOP with gcc 4.5 and gcc 4.5 release notes say that's possible:

                              Support for the XOP, FMA4, and LWP instruction sets for the AMD Orochi processors are now available with the -mxop, -mfma4, and -mlwp options.

                              Comment


                              • #30
                                Originally posted by Azpegath View Post
                                Yes they split the packages into src and data, and haven't released a data package for it. Sadly they just seem to want to distribute data via svn since their previous version. I've been trying to get them to release a data tar file matching the source release so we can get an updated package in Gentoo, but the devs doesn't reply on IRC. Their channel is just quiet. I've only tried for 2 whole days, but you'd think that somebody would reply.

                                There's a bug on the Gentoo bugzilla about it describing the issue furter (https://bugs.gentoo.org/show_bug.cgi?id=351409#c7)
                                IRC is by apppointment only afaik. Have you considered the forum http://vdrift.net/ or the issue tracker https://github.com/VDrift/vdrift/issues ?

                                Comment

                                Working...
                                X