ARM Cortex-A15 GCC Compiler Tuning Performance
To complement the recent compiler benchmarking on the ARM Cortex-A15 as found in the Samsung Exynos 5 Dual with the Samsung Chromebook, here's some compiler tuning benchmark results from the speedy low-power ARM system.
Uploaded to OpenBenchmarking.org this morning are a couple compiler tests when building the various Phoronix Test Suite test profiles with different GCC compiler tuning flags. The GCC ARM compiler options are mentioned in their online documentation. The ARM options being compared in this article include the stock compiler flags (no additional CFLAGS/CXXFLAGS set) compared to:
-mtune=cortex-a15
This option is very similar to the -mcpu= option, except that instead of specifying the actual target processor type, and hence restricting which instructions can be used, it specifies that GCC should tune the performance of the code as if the target were of the type specified in this option, but still choosing the instructions it generates based on the CPU specified by a -mcpu= option. For some ARM implementations better performance can be obtained by using this option.
-marm
Select between generating code that executes in ARM and Thumb states. The default for most configurations is to generate code that executes in ARM state, but the default can be changed by configuring GCC with the --with-mode=state configure option.
-mfpu=neon
This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: `vfp', `vfpv3', `vfpv3-fp16', `vfpv3-d16', `vfpv3-d16-fp16', `vfpv3xd', `vfpv3xd-fp16', `neon', `neon-fp16', `vfpv4', `vfpv4-d16', `fpv4-sp-d16', `neon-vfpv4', `fp-armv8', `neon-fp-armv8', and `crypto-neon-fp-armv8'. If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
In 1212044-RA-CORTEX15G86 are GCC 4.6 and GCC 4.7 stock benchmarks and then GCC 4.7 when testing various compiler tuning options. However, from this Samsung Exynos 5 Dual notebook, there isn't a huge performance difference to see.
Within 1212043-RA-COMPILERT89 are additional benchmark results from trying different GCC ARMv7 options, but again, the results aren't too mind-blowing.
Uploaded to OpenBenchmarking.org this morning are a couple compiler tests when building the various Phoronix Test Suite test profiles with different GCC compiler tuning flags. The GCC ARM compiler options are mentioned in their online documentation. The ARM options being compared in this article include the stock compiler flags (no additional CFLAGS/CXXFLAGS set) compared to:
-mtune=cortex-a15
This option is very similar to the -mcpu= option, except that instead of specifying the actual target processor type, and hence restricting which instructions can be used, it specifies that GCC should tune the performance of the code as if the target were of the type specified in this option, but still choosing the instructions it generates based on the CPU specified by a -mcpu= option. For some ARM implementations better performance can be obtained by using this option.
-marm
Select between generating code that executes in ARM and Thumb states. The default for most configurations is to generate code that executes in ARM state, but the default can be changed by configuring GCC with the --with-mode=state configure option.
-mfpu=neon
This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: `vfp', `vfpv3', `vfpv3-fp16', `vfpv3-d16', `vfpv3-d16-fp16', `vfpv3xd', `vfpv3xd-fp16', `neon', `neon-fp16', `vfpv4', `vfpv4-d16', `fpv4-sp-d16', `neon-vfpv4', `fp-armv8', `neon-fp-armv8', and `crypto-neon-fp-armv8'. If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
In 1212044-RA-CORTEX15G86 are GCC 4.6 and GCC 4.7 stock benchmarks and then GCC 4.7 when testing various compiler tuning options. However, from this Samsung Exynos 5 Dual notebook, there isn't a huge performance difference to see.
Within 1212043-RA-COMPILERT89 are additional benchmark results from trying different GCC ARMv7 options, but again, the results aren't too mind-blowing.
3 Comments