Originally posted by uid313
View Post
A simple case in point. The vp9enc benchmark in the phoronix-test-suite was missing an option that was added to help improve performance on many threaded machines. Nobody noticed this or questioned the results until I started profiling the bottleneck. Personally I never would have looked at this if I wasn't trying to figure out why my 16-core machine wasn't using all the cores. Linus is right, you get optimizations by developers based on what they are using. Currently most ARM development is hyper-focused on what they have. Unfortunately what many developers have had for the longest time is a Raspberry Pi. Think of how many developer hours were spent hyper-optimizing code for an almost defunct ARMv6 architecture?
In the benchmarks I posted against the Odroid-N2 you can see the GraphicsMagick benchmarks which are single threaded are actually pretty competitive with Intel, but then the single threaded GIMP benchmarks are nowhere close. This is most likely because GraphicsMagick has been optimized for small headless ARM boards, and GIMP hasn't been looked at for ARM architectures because nobody would run it there. You can take the GraphicsMagick benchmarks and compare them to other x86 results and then scale them against clock frequency. Remember this is early production SOCs as well. We are at 2Ghz now, production will be 2.2Ghz so a 10% boost, and we have a stable overclock at 2.4 Ghz which gives a per core 15% boost.
Comment