No announcement yet.

Clang Compiling Against GCC On Ubuntu ARM Linux

  • Filter
  • Time
  • Show
Clear All
new posts

  • Clang Compiling Against GCC On Ubuntu ARM Linux

    Phoronix: Clang Compiling Against GCC On Ubuntu ARM Linux

    Here's an update on the LLVM/Clang vs. GCC compiler benchmarking on ARM hardware under Linux...

  • #2
    "Evan Cheng Request for Help: Teach ARM target to auto-detect cpu / subtarget featuresThu May 10 22:11:23 CDT 2012

    I believe one of the reason the benchmark numbers are totally bogus is that the compilation are done on ARM hosts.

    Given the benchmarks are apparently compiled without -mcpu=cortex-a9
    , I suspect LLVM ended up generating code for "generic" ARMv4 cpu.

    This article makes me sick in my stomach.

    "Michael Larabel on June 11, 2012
    The bench marking was still being done from a
    PandaBoard ES with Texas Instruments OMAP4460 dual-core ARM Cortex-A9 development board. Via the CFLAGS/CXXFLAGS, -march=armv7-a was passed to each compiler. "

    On the other hand once you sort out your flags war and reach consensus it might be interesting to see this test run on a
    Calxeda quad-core ARM Cortex-A9 processor optimized for using in Servers over 10Gigabit/s internal fabric on each card
    sample box with 2 or more cards installed for 32 Cortex A9 cores/8 SOC and greater etc and you really should go and get the latest Linaro GCC etc too.

    http://<a href="</a>

    Hmm i cant seem to get in post video link working, odd.
    Last edited by popper; 06-11-2012, 12:12 PM.


    • #3
      Originally posted by popper View Post
      On the other hand once you sort out your flags war
      The flags used in this article were just normal, a compiler flag/tuning on ARM is forthcoming in a future multi-page article.
      Michael Larabel


      • #4
        armv7 is what e.g. Ubuntu will target in their upcoming ARM releases, so it seems very relevant how that performs. Compiling all software with hardware specific CFLAGS is typically only done by Gentoo or other source based distros.


        • #5
          Originally posted by chithanh View Post
          armv7 is what e.g. Ubuntu will target in their upcoming ARM releases, so it seems very relevant how that performs. Compiling all software with hardware specific CFLAGS is typically only done by Gentoo or other source based distros.
          Here are a few relavant flags I'd like to see tested:

          1. Ubuntu Standard armv7 + hard-float
          2. Android Standard armv7 + softfp (note: this is not soft-float, it's still using hardware fp, just the headers are compatible with soft-float).
          3. Android Standard armv5 + soft-float

          These three flags will cover most software written for Linux and Android.


          • #6
            Possibly Ubuntu and other distros will use -march=armv7-a -mtune=cortex-a9 (same idea as -march=i486 -mtune=i686 for x86) so that would be another interesting data point.


            • #7
              Again, what is the point of running the 7-zip benchmark with no -On optimization setting? This means that atleast GCC will default to -O0 which is no optimization. Just add -O2 or preferably -O3 so that this benchmark ends up being in any way relevant, NO ONE will use 7-zip compiled with no optimizations. You are benchmarking compiler optimization here, what possible point is it then to NOT enable optimizations????


              • #8
                Yeah, while the phoronix test suite framework itself is fine, the choice of benchmarks is very questionable at best.

                Let's have a look at the "popular" C-Ray 1.1 benchmark. It can be downloaded from
                It is typically run as "./c-ray-mt -t 32 -s 1600x1200 -r 8 -i sphfract -o output.ppm", but changing 1600x1200 to 160x120 lets it run for seconds instead of hundreds of seconds on ARM. Profiling of gcc-4.7.0 compiled code shows the following:
                ./c-ray-mt -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm
                samples  %        image name               symbol name
                28459    51.8672  c-ray-mt                 shade
                17869    32.5667  c-ray-mt                 ray_sphere
                4110      7.4906  c-ray-mt                 trace
                3185      5.8047  c-ray-mt                 render_scanline
                319       0.5814             __ieee754_pow
                194       0.3536             powl
                136       0.2479             __exp1
                108       0.1968             memcpy
                78        0.1422  c-ray-mt                 get_primary_ray
                73        0.1330  c-ray-mt                 get_sample_pos
                59        0.1075             isnanl
                42        0.0765  vmlinux                  __do_softirq
                36        0.0656  vmlinux                  __schedule
                35        0.0638             checkint
                31        0.0565             fputc
                18        0.0328             __mul
                4         0.0073  c-ray-mt                 main
                And this reveals a major performance problem: function calls overhead is insane. Just making sure that ray_sphere function gets inlined improves performance significantly. As a workaround, -finline-limit=100000 option can be added for more aggressive inlining. The results of "./c-ray-mt -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm" on ARM Cortex-A9 1.2GHz compiled with gcc 4.7.0:
                Rendering took: 6 seconds (6685 milliseconds) for CFLAGS="-O3 -ffast-math"
                Rendering took: 5 seconds (5436 milliseconds) for CFLAGS="-O3 -ffast-math -finline-limit=100000"

                But the real fix is to use "static inline" for the performance critical functions. The one who developed this C-Ray application apparently has no clue about performance optimizations. Or maybe it was done on purpose to make the job harder for the compilers. The compilers, which are configured to use aggressive inlining by default are going to win by a huge margin on this test (trading it for larger binary sizes because there are no free cookies).

                Generally, I get an impression that such selection of phoronix benchmarks has been done on purpose. Surely, when having compiler optimizations disabled or benchmarking poorly written code such as C-Ray, the difference between the results from different compilers may be quite significant (and mostly random). Benchmarking properly written code with properly selected optimization options is surely boring, because it is less likely to show surprising wins or sensations
                Last edited by ssvb; 06-12-2012, 05:42 AM.


                • #9
                  Depending on how GCC was configured (you can see by passing -v), this might be a non-issue, but passing only -march=armv7-a without other -mtune= or -mcpu= options might have resulted in GCC tuning for the Cortex-A8.
                  You might want to re-check to be sure...


                  • #10

                    Tuning for Cortex-A8 works good for Cortex-A9 too. They are reasonably similar, and scheduling instructions for in-order dual-issue processor does not usually do any harm for its out-of-order dual-issue twin. Moreover, there are cases when -mcpu=cortex-a9 is bad for performance: (just filed this enhancement request)