Announcement

Collapse
No announcement yet.

ARM Cortex-A9 PandaBoard ES Benchmarks

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    As for the benchmarks, "compress-7zip" test compiles the code with -O optimization, which is equivalent to -O1:
    Code:
    $ cat install.log 
    mkdir -p bin
    make -C CPP/7zip/Bundles/Alone all
    make[1]: Entering directory `/mnt/mmcblk0p2/.phoronix-test-suite/installed-tests/pts/compress-7zip-1.6.0/p7zip_9.20.1/CPP/7zip/Bundles/Alone'
    g++ -O -pipe -s -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DNDEBUG -D_REENTRANT -DENV_UNIX -D_7ZIP_LARGE_PAGES -DBREAK_HANDLER -DUNICODE -D_UNICODE -c -I. -I../../../myWindows -I../../../ -I../../../include_windows ../../../myWindows/myGetTickCount.cpp
    g++ -O -pipe -s -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DNDEBUG -D_REENTRANT -DENV_UNIX -D_7ZIP_LARGE_PAGES -DBREAK_HANDLER -DUNICODE -D_UNICODE -c -I. -I../../../myWindows -I../../../ -I../../../include_windows ../../../myWindows/wine_date_and_time.cpp
    ...
    This can be solved by setting EXTRAOPTFLAGS environment variable to something more reasonable, for example at least "-O2".

    The build system for libvpx clearly does not use NEON, which explains poor results for "VP8 libvpx Encoding" test:
    Code:
     # cat install.log 
    Configuring selected codecs
      enabling vp8_encoder
      enabling vp8_decoder
    Configuring for target 'generic-gnu'
      enabling generic
    Creating makefiles for generic-gnu libs
    Creating makefiles for generic-gnu examples
    Creating makefiles for generic-gnu docs
        [DEP] vpx_config.c.d
        [DEP] vp8/decoder/reconintra_mt.c.d
        [DEP] vp8/decoder/idct_blk.c.d
        [DEP] vp8/decoder/threading.c.d
        [DEP] vp8/decoder/onyxd_if.c.d
    ...
    Trying to configure libvpx as "./configure --target=armv7-linux-gcc" spits out a funny error message: "Unable to invoke compiler: arm-none-linux-gnueabi-gcc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64". Why would they expect the compiler to be named this way?

    Some other tests may show suboptimal results too, but I haven't looked there yet.

    Comment


    • #47
      Originally posted by ssvb View Post
      BTW, here are my benchmark results from gentoo (hardfp) running on origenboard (dual-core ARM Cortex-A9 @1.2GHz): http://openbenchmarking.org/result/1...AR-1112277AR91
      Great set of benchmarks... So if your benchmarks are accurate, it shows the Cortex A9 coming in faster than the Intel ATOM series, and even faster than the Pentium 4 in many tests!

      Comment


      • #48
        Enabled the use of NEON in VP8 LIBVPX ENCODING test by hacking libvpx build scripts:
        Code:
        $ ./configure --target=armv7-linux-gcc
        Configuring selected codecs
          enabling vp8_encoder
          enabling vp8_decoder
        Configuring for target 'armv7-linux-gcc'
          enabling armv7
          enabling armv6
          enabling armv5te
          enabling fast_unaligned
        Creating makefiles for armv7-linux-gcc libs
        Creating makefiles for armv7-linux-gcc examples
        Creating makefiles for armv7-linux-gcc docs
        This improves Frames Per second rating from 1.01 to 1.35 for Exynos4210. Though this is still worse than 1.55 shown by Intel Atom.

        Comment


        • #49
          Originally posted by ssvb View Post
          As for the benchmarks, "compress-7zip" test compiles the code with -O optimization, which is equivalent to -O1
          And appears that this was not the most broken test in the set. The winner is SMALLPT:
          Code:
          $ cat install.sh
          #!/bin/sh
           
          tar -zxvf smallpt-1.tar.gz
          g++ -fopenmp smallpt.cpp -o smallpt-renderer
          echo $? > ~/install-exit-status
           
          echo "#!/bin/sh
          ./smallpt-renderer 100 > \$LOG_FILE 2>&1
          echo \$? > ~/test-exit-status" > smallpt
          chmod +x smallpt
          This test program gets built without any optimizations at all! If we append -O3 option to the existing -fopenmp, the result for Exynos 4210 improves from 2489 seconds to 557 seconds! This is very disturbing and shows that phoronix-test-suite needs some major fixes. And a lot of data collected at openbenchmarking.org up to this moment is just useless garbage

          Comment


          • #50
            While that does improve the Exynos results, it doesn't invalidate the results relative to each other. They all got the same optimization.

            Comment


            • #51
              Do you mean the relative performance of the results with disabled optimizations? Achieving good performance of the generated code is not the target for such builds at all. They are mostly useful for two reasons:
              1. disabling optimizations significantly reduces the chances of triggering some compiler bugs (can be specifically wanted for bootstrapping or debugging purposes)
              2. makes the process of compilation faster (speeds up the development)

              Now consider the following analogy. Imagine that you ask two guys to walk from point A to point B without mentioning the real purpose of this activity. Moreover, you the only hint you give them is to be absolutely sure not to slip and fall. But once one of the guys reaches the destination, you suddenly award him the "fastest runner" prize And if anybody tries to complain, you just say that the conditions are the same, there surely must be some correlation between how fast a person can walk and run, so such competition is fair and the relative running speed must be still the same.

              The same applies to processors and compilers. Optimized build can be easily several times faster than non-optimized build. And you can't generally predict this ratio beforehand.

              And in the real world everything is even more tricky. One of the guys could have known beforehand that you have a habit of holding such competitions So he might easily use this information to his advantage. In any case, the only valid benchmarking method is to enable the best optimizations because such benchmarks are reflecting real performance and can't be easily cheated.

              PS. Exynos is "faster" than Atom even in the existing SMALLPT test. But this does not change the fact that the test itself is broken.

              Comment


              • #52
                Agreed.

                And in the real world everything is even more tricky. One of the guys could have known beforehand that you have a habit of holding such competitions So he might easily use this information to his advantage. In any case, the only valid benchmarking method is to enable the best optimizations because such benchmarks are reflecting real performance and can't be easily cheated.
                Agreed! ARM hasn't had the years of verification like x86 where things that were once optimizations became labelled as safe, so naturally the unoptimized compilations will be worse performance-wise on ARM. You've got to compile for what you measure for:
                For Performance: optimized but generic enough for whatever class of devices you're considering

                Comment


                • #53
                  DSP usage for encoding ???

                  most of arm SoC has hardware acceleration for several mp4 encodings. That's strange that VP8 and x264 bench only 1fps, in real life, even a cortex A8 like Hummingbird, that is near 2 years old, or older Rockchip Soc (that as vp8 hard codecs) can encode 720p mp4 at 30fps using very few energy (from 100mW to less than 1 W) on Android/Linux.

                  A Samsung Galaxy S (1) video encoded with my phone is view in ffmpeg as :

                  <pre>
                  VIDEO: [H264] 1280x720 24bpp 1000.000 fps 12141.3 kbps (1482.1 kbyte/s)
                  Clip info:
                  major_brand: 3gp4
                  minor_version: 768
                  compatible_brands: 3gp43gp6
                  ================================================== ========================
                  Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
                  Selected video codec: [ffh264] vfm: ffmpeg (FFmpeg H.264)
                  ================================================== ========================
                  ================================================== ========================
                  Opening audio decoder: [ffmpeg] FFmpeg/libavcodec audio decoders
                  AUDIO: 16000 Hz, 1 ch, s16le, 57.6 kbit/22.51% (ratio: 7203->32000)
                  Selected audio codec: [ffaac] afm: ffmpeg (FFmpeg AAC (MPEG-2/MPEG-4 Audio))
                  ================================================== ========================
                  </pre>

                  So it can easily encode AAC + 720p H264 @ 30fps. This bench is not accurate at all with it's 1.87 for OMAP4 or 4.59 fps Exynos 4210, that should both encode 1080p H264+AAC audio at 30fps, only using DSP. And sure atom or pentium can't do this at all.
                  Last edited by Popolon; 01-17-2012, 01:02 PM.

                  Comment


                  • #54
                    Originally posted by Popolon View Post
                    most of arm SoC has hardware acceleration for several mp4 encodings. That's strange that VP8 and x264 bench only 1fps, in real life, even a cortex A8 like Hummingbird, that is near 2 years old, or older Rockchip Soc can encode 720p mp4 at 30fps using very few energy (from 100mW to less than 1 W) on Android/Linux.

                    -cut-

                    So it can easily encode AAC + 720p H264 @ 30fps. This bench is not accurate at all with it's 1.87 for OMAP4 or 4.59 fps Exynos 4210, that should both encode 1080p H264+AAC audio at 30fps, only using DSP. And sure atom or pentium can't do this at all.
                    I think I read somewhere that the DSP functionality is closed, and only works through a non-open driver, so we cannot expect this functionality to work inside an open linux distro.

                    Comment


                    • #55
                      Originally posted by gururise View Post
                      I think I read somewhere that the DSP functionality is closed, and only works through a non-open driver, so we cannot expect this functionality to work inside an open linux distro.
                      Samsung works a lot for integrate its own SoC in kernel main tree and they added Khronos OpenMAX functionality to GST (package description : gst-openmax - Accelerated gst drivers) :
                      https://launchpad.net/~linaro-landin...+build/2290190

                      * OMAP3 driver for DSP AAC and mpeg4 codec are available via OpenMAX LI (gpl), perhaps with closed source lower layers :
                      http://omappedia.org/wiki/OpenMAX_Project

                      I don't know for ST-Ericsson Nova and all others (ARMlogic, Rockchip, Fujitsu, Nufront......) but I suppose there use OpenMAX low level linux driver for android too.

                      [Update] : At least Android 4.0 use it via an open source implementation : http://en.wikipedia.org/wiki/OpenMAX#Implementations

                      [Update 2] : The Samsung Exynos 5250 (32nm Cortex-A15) is already capable of encoding 4K2K@30fps and 1080p@60fps, it's already available for developers, and it's main goal is tablet (aka Android or other flavor of Linux and USB should probably be enough for electric power) :
                      http://www.samsung.com/global/busine...s/news_11.html
                      Last edited by Popolon; 01-17-2012, 05:02 PM.

                      Comment


                      • #56
                        Both libvpx and x264 are software codecs and don't use DSP or HW accelerators. There is nothing wrong with using them in benchmarks if the purpose is to evaluate CPU performance.

                        Moreover, hardware video accelerators are often quite picky about what video format variants they support (for decoding) or sacrifice quality (for encoding): http://www.behardware.com/articles/8...-and-x264.html
                        So x264 is not totally useless even nowadays.

                        Comment


                        • #57
                          Comparison with different models.

                          First of all I learned a lot from the article and forum. Thanks.

                          I'm considering between two models: the ES and the VAR-SOM-OM44, which can be also be equated with the ES SoC-wise. According to the specifications of the OM44 it reaches 1.5GHz..
                          Also, ES highlights its bluetooth, but for me it seems the same between the two modules
                          Other main concerns of mine are android 4.0 support and audio.
                          Did anyone by any chance had the opportunity to compare or benchmark the OM44? Or Experience with it?
                          Thanks a lot in advance!
                          Gili

                          Comment


                          • #58
                            comparison of different modals

                            First of all I learned a lot from the article and forum. Thanks.

                            I'm considering between two models: the ES and the VAR-SOM-OM44, which can be also be equated with the ES SoC-wise. According to the specifications of the OM44 it reaches 1.5GHz..
                            Also, ES highlights its bluetooth, but for me it seems the same between the two modules
                            Other main concerns of mine are android 4.0 support and audio.
                            Did anyone by any chance had the opportunity to compare or benchmark the OM44? Or Experience with it?
                            Thanks a lot in advance!
                            Gili

                            Comment


                            • #59
                              gili2000: best is to avoid omap, that is near unusable on Linux, due to lack of hw acceleration, better is to use other alternatives, Mali 400 based.

                              ARM motherboard that are supported by linaro.org (working the most on linux port to ARM, and that even help improving linux more generaly) can be found on their site (there are links to resellers):

                              http://www.linaro.org/low-cost-development-boards

                              Samsung Exynos is well supported and hw acceleration of the desktop works since about 6 month, ST/Ericsson mother board use Mali 400 too, but don't know if it works well too.

                              Comment

                              Working...
                              X