ARM Cortex-A15 Compiler Optimizations
For GraphicsMagick, -O0 is obviously the slowest with the compiler not applying any optimizations. The -Os performance was close to -O1. For some reason, GraphicsMagick when building with GCC 4.7.2 on ARM when setting the CFLAGS/CXXFLAGS to -O3 would only be honored as -O2. Using -Ofast didn't yield any greater gains.
-O2/-O3 produced the fastest binaries for Himeno, which were more than twice as fast as -O0 without any optimizations. Passing -Ofast actually led to a minor setback in performance compared to -O2.