Announcement

Collapse
No announcement yet.

AMD FX-8150 With The Open64 5.0 Compiler

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD FX-8150 With The Open64 5.0 Compiler

    Phoronix: AMD FX-8150 With The Open64 5.0 Compiler

    The Open64 5.0 compiler was released earlier this month with many changes, among the prominently noted items were greater optimizations for AMD's Bulldozer CPUs. In this article is a first-look at the Open64 5.0 compiler performance compared to its earlier release, as tested on an AMD FX-8150 eight-core "Bulldozer" processor.

    http://www.phoronix.com/vr.php?view=16733

  • #2
    Up to 30% improvement!

    Up to 30% improvement!

    Would be interesting to see if the new compiler improves for Intel CPUs in that benchmark too.

    Comment


    • #3
      Not much improvement for Pov-Ray floating point work..

      A lot of the whining against Bulldozer has been about it's lack of floating point performance from Windows users. Of course, with technologies like OpenCL, it's far better to be doing floating point math on GPUs than CPUs since they're over 1000x faster at it. A lot of game companies still do WAYYY too much floating point work on the CPU. Games such as Bad Company 2 and the like do all their physics calculations on the CPU when they really should be done on the GPU, since the GPUs are designed for those types of massively parallel floating point calculations and CPUs really aren't.

      Thankfully there are physics engines out such as Havok that run under OpenCL, which means game companies don't have any excuse to continue using the CPU for so much floating point work. Which I think makes these Bulldozer chips a good choice long-term since they can beat Intel's more expensive chips in integer performance. Though of course, the open source linux drivers are a long way away from supporting OpenCL, though not many serious gamers (Crysis 3, Battlefield 3, etc) run those drivers anyway. OpenCL is going to become much more important in the future as AMD is shifting the focus of floating point away from their CPU cores and towards their APU / GPU cores which run OpenCL for floating point work..
      Last edited by Sidicas; 11-25-2011, 07:36 AM.

      Comment


      • #4
        Thankfully there are physics engines out such as Havok that run under OpenCL, which means game companies don't have any excuse to continue using the CPU for so much floating point work.
        Hm, I thought "available market" is a good excuse. If your minimum requirements are mid-range HD6k or GTX4xx card, that's cutting out a lot of people.

        Comment


        • #5
          Originally posted by sabriah View Post
          Up to 30% improvement!

          Would be interesting to see if the new compiler improves for Intel CPUs in that benchmark too.
          That's exactly the problem with almost all benchmarking sites, Phoronix not withstanding.

          Tom's hardware concluded that 6 core Sandy Bridge-E was 30% faster than the FX-8150. How much of that came down to compiler optimizations, especially since a rather suspicious number of benchmarks are compiled with Intel's own ICC compiler? There is no mainstream compiler that is AMD-biased to balance out the results. Of the 30 benchmarks, the average user will use between 0 and 3 of those applications in real life, but yet they will be "recommended" to buy the Intel CPU based on a useless aggregate score that is distorted by synthetic benchmarks like Futuremark which always favor Intel by an unrealistic amount compared to real life.

          30% isn't that big of a difference in real life anyways(assuming it's even really 30%), especially since most CPUs are in idle/power-saving mode most of their lifetime. If AMD would market Bulldozer as a quad-core with superior hyperthreading, then it suddenly becomes the world's fastest consumer-grade CPU, since that 6 core SB-E CPU requires 30% more die size, 2 more cores, and costs 4x as much to acheive only 30% more performance.

          *Posted from my screaming fast FX-8120*

          Comment


          • #6
            Offloading FP to GPU should mean lower prices than current CPUs

            Offloading FP to GPU should mean lower prices than current CPUs since it means the CPU was designed to do less than current CPUs.

            One thing not clear from traditional benchmarks is what are the capabilities of the CPUs and then test those capabilities. That way you get an idea of what you are paying for rather than what the quality of software X is. Software X may be written by a crappy programmer.

            Has BD 8150 been compared to previous Phenom IIs? That way we would have an idea of performance compared to past CPUs and if the price is justified.

            Comment


            • #7
              Originally posted by linux5850 View Post
              Has BD 8150 been compared to previous Phenom IIs? That way we would have an idea of performance compared to past CPUs and if the price is justified.
              Yup, and unless you are doing something like cryptography you are better off with an X6 at this point.

              Comment


              • #8
                Originally posted by deanjo View Post
                Yup, and unless you are doing something like cryptography you are better off with an X6 at this point.
                The comparisons that have been done against the Phenom IIs have used applications compiled without bulldozer optimizations and under an OS with a thread scheduler that doesn't understand the modular design (2 cores per module with some shared resources) of Bulldozer...

                So no, the comparisons against the Phenom II aren't fair at all..

                Comment


                • #9
                  Originally posted by sabriah View Post
                  Up to 30% improvement!

                  Would be interesting to see if the new compiler improves for Intel CPUs in that benchmark too.
                  If you compile those applications with the Bulldozer optimizations, the compiled binaries don't run on Intel CPUs.. So I don't see how such a comparison could be made. I think there is a way to compile a binary so that it only enables the Bulldozer optimizations if you have a Bulldozer CPU, but I'm not sure if Open64 does this and what options need to be set to do it.. But even in that situation, it would mean the Intel chips don't get *ANY* of the Bulldozer optimizations anyway. So I'd say it's a pretty safe bet that all the Bulldozer optimizations are only applicable to Bulldozer CPUs and would not help Intel CPUs at all since the Intel CPUs currently don't even support FMA3 (coming 2013 for Intel), let alone FMA4.

                  Keep in mind, Bulldozer runs FMA4 while future Intel CPUs will run FMA3.. They're mutually exclusive though I'm sure there are some tricks in there to get a binary to run FMA4 on Bulldozer CPUs and FMA3 on Intel CPUs (different compiled paths).. Certainly a lot of the performance boosts in this new Open64 compiler revolve around using FMA4. It's the only compiler out there besides GCC that has FMA4 accelerations on the drawing board.
                  Last edited by Sidicas; 11-25-2011, 09:42 PM.

                  Comment


                  • #10
                    Originally posted by deanjo View Post
                    Yup, and unless you are doing something like cryptography you are better off with an X6 at this point.
                    Wow, what a sweeping generalization... and a very misleading one at that.

                    How about something like:

                    "Unless you're building a PC to run the Cinnebench single threaded benchmark, you're better off with a Core2 Duo."

                    Bulldozer did have some regressions, mostly in single threaded benchmarks. However, it's also faster than the Phenom II X6 in many single threaded benchmarks, and almost universally faster in well threaded benchmarks.

                    I'm posting from an FX8120, and it feels faster than any Sandy Bridge, Nehalem or Phenom II I've ever used. I have the following windows open:

                    Eclipse(EPIC-Perl)
                    Netbeans(PHP)
                    Firefox
                    A Virtualbox VM running an Apache/PHP/Postgresql test server
                    A Virtualbox VM running a SVN server
                    PGAdmin3
                    Several terminals
                    Gedit
                    ...and a few more random windows

                    , and not ever a hint of lag, despite running 2 craptastic Java-based IDEs at the same time. I can even do something CPU intensive like creating a Truecrypt volume or compiling the Linux kernel, and still no slowdown whatsoever. I hate to break it to you, but a quad core Sandy Bridge cannot do all of those things and still be perfectly responsive, especially if you're using it's IGP.

                    Comment


                    • #11
                      Originally posted by leeenux View Post
                      Wow, what a sweeping generalization... and a very misleading one at that.

                      How about something like:

                      "Unless you're building a PC to run the Cinnebench single threaded benchmark, you're better off with a Core2 Duo."

                      Bulldozer did have some regressions, mostly in single threaded benchmarks. However, it's also faster than the Phenom II X6 in many single threaded benchmarks, and almost universally faster in well threaded benchmarks.

                      I'm posting from an FX8120, and it feels faster than any Sandy Bridge, Nehalem or Phenom II I've ever used. I have the following windows open:

                      Eclipse(EPIC-Perl)
                      Netbeans(PHP)
                      Firefox
                      A Virtualbox VM running an Apache/PHP/Postgresql test server
                      A Virtualbox VM running a SVN server
                      PGAdmin3
                      Several terminals
                      Gedit
                      ...and a few more random windows

                      , and not ever a hint of lag, despite running 2 craptastic Java-based IDEs at the same time. I can even do something CPU intensive like creating a Truecrypt volume or compiling the Linux kernel, and still no slowdown whatsoever. I hate to break it to you, but a quad core Sandy Bridge cannot do all of those things and still be perfectly responsive, especially if you're using it's IGP.
                      you just point out the effect of running a benchmark and the reality.
                      yes in the reality all cores are floated with any kind of stuff all the time.
                      in an benchmark only 1 program make a load.
                      i also think that the bulldozer beat the intel cpu if you run all stuff in the same time.

                      i for example i do not open up 1 browser windows i do have 60+ browser windows.
                      you can beat any cpu only with the browser just open up 200 of them and do some stuff.

                      the "singlecore" performance really only care in benchmarks.

                      Comment


                      • #12
                        So I tried to test your experience with intel and amd systems. I own a Athlon64 3700+, used it until end of oktober. It had a nvidia 6600GT inside. Since then i have a i7 2600K (running the iGPU).
                        Another PC in the room runs on an AthlonII X3 435 (3x2.9GHz) and a nvidia 220 GT.
                        All PCs running kde4 (3700+: kde-4.6.5, the others kde-4.7.3)
                        Running 5 Firefox-windows, some with dolphin, many OOo-windows made the UI quite laggy in the X3, my 3700+ even cried when running kile + one OOo + one firefox with 10 tabs.
                        I never experienced any lag on my i7, so i opened firefox with 15 tabs, running several websites, including Flash videos. Opened 30 more FF-Windows, tvbrowser (java), kile and kdevelop.
                        At this point, Desktop-Effects from kde4 were not that quick anymore - Alt-Tab needed a fraction of a second to show me the cover switch, but then, the windows switched smoothly.
                        So, ok, I need to do something stressing the CPU...
                        building kdelibs with -j10, building gentoo-sources with -j5, importing kdelibs into kdevelop, in parallel do edit projects with large files in kdevelop and kile (ok - I can't edit two files at the same time, so I switched forth and back).
                        All 8 Cores pushed to 100%, and - filnally - editing became some sort of laggy! The Cursor needed a fraction of a second to jump when typing!
                        But - hey - everyone knows kwin is not that performant (remember: Opening some windows on an AthlonII X3 + nvidia GPU and doing nothing stressful but editing some files in OOo made the UI laggy), so I disabled desktop effects, et voila, lags disappeared! Editing goes as smooth as everytime (still, kdevelop importing kdeibs, 15 threads compiling kdelibs+kernel on 8 cores, java running, many firefox windows/tabs). To give a minimum load on the GPU, I started the only game I own - xmoto Loading a replay, arranging windows so that xmoto is not covered (note: You must bring the cursor above the xmoto-window in order to start playback of the replay)- anything runs fine, no lag when typing in kile, kdevelop autocompletion comes down instantly, even xmoto runs smooth as if there was nothing!

                        So I really can't understand how you could experience such lags with an intel-based system, even when using the iGPU).
                        What kind of GPU do you use in your Bulldozer-systems?
                        If you experience Lags on intel-systems when running the iGPU, just disable desktop effects (Or use a more performant window manager).

                        Comment


                        • #13
                          Originally posted by Qaridarium View Post
                          i for example i do not open up 1 browser windows i do have 60+ browser windows.
                          you can beat any cpu only with the browser just open up 200 of them and do some stuff.
                          What the hell do you need 60+ open websites (not talking about 200)? Watching websites for changes?!?
                          Software engineers know: Polling is the most expensive way to watch changes.
                          If you have the time to regularly watch those 60+ pages for changes - OK, I would like to have your job :/ Nevertheless I would recommend to let you notify by RSS or Mail.
                          This is the exact opposite of the regular benchmarking (just running 1 app), and in the same way not usual.

                          I usually have some documentation open in my browser, kdevelop and/or kile, not too often libreoffice. In the background often some compiling is going on. Sometimes I watch videos (with compiling in the background :P), and sometimes I play xmoto. That's why I wanted a system with iGPU. I was thinking to get a Llano-based system. When I said "now it's the time" they were not available anymore... Then I said "wait for Bulldozer" - as the NDA expired and the first benchmarks popped up, I said "no, not with that power consumption under load!" (BTW: until now, there is no AM3+-Board with on-Chip-GPU (BD-Ready boards with AM3-Chip don't count - they just make trouble with BD...).
                          My i7 system uses ~30W on idle, 40-50W on average usage, and 120W on heavy load. I think this would not be possible with BD + discrete GPU.

                          Comment


                          • #14
                            Originally posted by schmalzler View Post
                            So I tried to test your experience with intel and amd systems. I own a Athlon64 3700+, used it until end of oktober. It had a nvidia 6600GT inside. Since then i have a i7 2600K (running the iGPU).
                            Another PC in the room runs on an AthlonII X3 435 (3x2.9GHz) and a nvidia 220 GT.
                            All PCs running kde4 (3700+: kde-4.6.5, the others kde-4.7.3)
                            Running 5 Firefox-windows, some with dolphin, many OOo-windows made the UI quite laggy in the X3, my 3700+ even cried when running kile + one OOo + one firefox with 10 tabs.
                            I never experienced any lag on my i7, so i opened firefox with 15 tabs, running several websites, including Flash videos. Opened 30 more FF-Windows, tvbrowser (java), kile and kdevelop.
                            At this point, Desktop-Effects from kde4 were not that quick anymore - Alt-Tab needed a fraction of a second to show me the cover switch, but then, the windows switched smoothly.
                            So, ok, I need to do something stressing the CPU...
                            building kdelibs with -j10, building gentoo-sources with -j5, importing kdelibs into kdevelop, in parallel do edit projects with large files in kdevelop and kile (ok - I can't edit two files at the same time, so I switched forth and back).
                            All 8 Cores pushed to 100%, and - filnally - editing became some sort of laggy! The Cursor needed a fraction of a second to jump when typing!
                            But - hey - everyone knows kwin is not that performant (remember: Opening some windows on an AthlonII X3 + nvidia GPU and doing nothing stressful but editing some files in OOo made the UI laggy), so I disabled desktop effects, et voila, lags disappeared! Editing goes as smooth as everytime (still, kdevelop importing kdeibs, 15 threads compiling kdelibs+kernel on 8 cores, java running, many firefox windows/tabs). To give a minimum load on the GPU, I started the only game I own - xmoto Loading a replay, arranging windows so that xmoto is not covered (note: You must bring the cursor above the xmoto-window in order to start playback of the replay)- anything runs fine, no lag when typing in kile, kdevelop autocompletion comes down instantly, even xmoto runs smooth as if there was nothing!

                            So I really can't understand how you could experience such lags with an intel-based system, even when using the iGPU).
                            What kind of GPU do you use in your Bulldozer-systems?
                            If you experience Lags on intel-systems when running the iGPU, just disable desktop effects (Or use a more performant window manager).
                            The GPU is just a basic HD5450 running the FOSS drivers.

                            If you experience Lags on intel-systems when running the iGPU, just disable desktop effects (Or use a more performant window manager).
                            You've pretty much said it all, my friend. KDE is a steaming bucket of cack, I would thoroughly expect it to be laggy under all kinds of circumstances that a respectable DE wouldn't. There was a nice thread about it on here not too long ago, where Martin tried to convince everyone that the sky is green at night, and red during the day, despite many people claiming to have experienced the cack-ness of Kwin. In fact, Intel IGPs probably provide the best KDE experience, while providing the absolute worst experience in every other DE. You wouldn't even have to turn off the desktop effects if you weren't using KDE, I would invite you to try it with something like the latest Ubuntu and Unity, or even a Gnome2 or 3 distro.

                            Aside from that, I'm having a bit of a hard time following what you're saying (English is probably not your first language), but I think a few things come into play:

                            1. 8 cores: You don't have 8 cores, you have 4 real cores, and 4 not-so-real-cores, and if they are all at 100% then you're guaranteed to experience lag because it can only handle 4 real work threads at a time, see point 2:
                            2. You actually had lag once you pushed it in a real world scenario(albeit not a very common one) , see point 3:
                            3. I've still never seen any lag under any circumstances

                            I suppose I could try to conjure up some unrealistic scenario like running:

                            1. All of my developer stuff
                            2. An entire virtual network of 6 servers and 6 clients
                            3. Every single VM simultaneously creating a Twofish-AES-Serpent Truecrypt volume on a different VM

                            , but what would that prove, as that is not a realistic load I'd ever want to run? Obviously at some point you could bog it down, the point was that even as an advanced power user, I cannot forsee a real world scenario that could bring my Bulldozer to it's knees, whereas you demonstrated that 3 simultaneous CPU-intensive tasks on your Intel CPU can grind the rest of your desktop to a halt. I can(and do) run my developer stuff and a whole slew of VMs, and I can even run a couple of CPU intensive tasks at the same time, mostly because Bulldozer is just a better parallel CPU that the equivalent Intel CPU, and yes, giving me twice as many real-ish cores for the same price helps a lot.

                            Comment


                            • #15
                              Originally posted by Qaridarium View Post
                              i for example i do not open up 1 browser windows i do have 60+ browser windows.
                              Have tried using tabs

                              Comment

                              Working...
                              X