No announcement yet.

Ubuntu 12.04 ARM Performance Becomes Very Compelling

  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    On that neon lib, looks like some of its code sacrifices accuracy (like in the floorf example mentioned there). That may make it produce wrong results in apps that care about accuracy, ie all not already compiling with -ffast-math.


    • #17

      I haven't had a chance to go through the code. What caught my eye was the author's statement that the GCC compiler didn't fully implement NEON. I think that the ratios of the SciMark Composite times to the Jacobi times between the compilers should be closer to the same. Re looking at them, they aren't that far off old/new ~65% vs ~60%.

      I thought that you had used the older compiler on the Dec OMAP4 vs Intel data, but the SciMark Composite is the same as the 12.04 1/26 number


      • #18
        Yup... time to take a deep look at how these codes are implemented...

        Every time I recompile I get a 10% performance bump on most benches...

        My EA3, with it's 4430, beats or ties the best ES score (allegedly unoptimized) on 7 of 26 benchies and it is really close on another 5 or so..

        Items of concern; the standard deviations are quite large; I will see if I have anything running on the Panda that may explain the cycle stealing between runs... some codes don't use both cores, and while OpenMPI was a dependency during the install it wasn't called on many codes. Also, I installed omap4-extras that is a binary blob provided by ImgTec to unlock some functionality of the SGX540 but I don't think I am getting any math performance from it. ARM Cortex -A9 is fully IEEE-754 compliant but NEON, PVR, and Ducati aren't so even if we can unlock some performance there, it may provide accuracy problems for us.

        Couldn't install SciMark for some reason, but we wouldn't have won that competition running at 1 Ghz, I shouldn't think... will try this once more before Precise ships in a week or so; (there are indications these will run even faster as a diskless node; lack of swap may be in play there). Sigh, looks like I have my summer project defined for me...


        • #19
          OMAP4 vs. Tegra 3

          Dual core vs. Quad core... it is interesting to see what the Panda excels at...

          Note that I segfaulted on 15 of the codes, mostly the GPU-centric ones... maybe we need a few more ifdefs in there. We will probably hold off further testing until we get our arms around this (and anyway, Ubuntu Precise "ships" on April 24th, so we may go back and try to do a headless install with that revision).


          • #20
            Hm, wasn't vpxenc multithreaded? Or is the test using it with a single thread only?