Page 3 of 5 FirstFirst 12345 LastLast
Results 21 to 30 of 43

Thread: Quad-Core ODROID-X Battles NVIDIA Tegra 3

  1. #21
    Join Date
    Nov 2011
    Posts
    17

    Default

    Interesting to see the ODROID-X running as well as it did. The question I have though is it because of the kernel, compiler and file system that is slowing the Tegra 3 down?

  2. #22
    Join Date
    Jan 2012
    Posts
    113

    Default

    Quote Originally Posted by ldesnogu View Post
    If you want to push power consumption make sure NEON is being used on the 4 cores. Video encoding/decoding with FFmpeg and 4 threads should do it
    If you want to maximize the power consumption for testing purposes, synthetic cpuburn test programs seem to be a lot more effective than any real workload. Here was my attempt: https://github.com/ssvb/ssvb.github....b-cpuburn-a9.S. Maybe somebody else could do even better.

  3. #23
    Join Date
    Jan 2012
    Posts
    113

    Default

    Quote Originally Posted by Darkseider View Post
    Interesting to see the ODROID-X running as well as it did. The question I have though is it because of the kernel, compiler and file system that is slowing the Tegra 3 down?
    For the start, Tegra 3 obviously was not running at 1.4GHz clock frequency even for single-threaded workloads (contrary to what is said in the article). I have a dejavu feeling

  4. #24
    Join Date
    Sep 2011
    Posts
    156

    Default

    Quote Originally Posted by ssvb View Post
    For the start, Tegra 3 obviously was not running at 1.4GHz clock frequency even for single-threaded workloads (contrary to what is said in the article). I have a dejavu feeling
    You talking Nvidia here, the reason Linus said fuck you to them is cause they do not want their hardware to work well with the Linux OSS stack, of course its gonna get trashed in the bentmarks. Nvidia is a hardware non grata if you use and love Linux!

  5. #25
    Join Date
    Oct 2008
    Posts
    106

    Default

    Quote Originally Posted by Rallos Zek View Post
    You talking Nvidia here, the reason Linus said fuck you to them is cause they do not want their hardware to work well with the Linux OSS stack, of course its gonna get trashed in the bentmarks. Nvidia is a hardware non grata if you use and love Linux!
    Yes, nvidia doesn't want to run well Android

  6. #26
    Join Date
    Oct 2008
    Posts
    106

    Default

    Quote Originally Posted by ssvb View Post
    If you want to maximize the power consumption for testing purposes, synthetic cpuburn test programs seem to be a lot more effective than any real workload. Here was my attempt: https://github.com/ssvb/ssvb.github....b-cpuburn-a9.S. Maybe somebody else could do even better.
    Your program doesn't use NEON datapath, did you try it?

  7. #27
    Join Date
    Jan 2012
    Posts
    113

    Default

    Quote Originally Posted by ldesnogu View Post
    Your program doesn't use NEON datapath, did you try it?
    It does. VLD2.8 is a NEON instruction.

    edit: Or do you mean NEON arithmetic instructions? Experiments with an ammeter show that load/store instructions are by far consuming more power, so it's a waste executing anything other than VLD in the NEON unit. Cortex-A8 is a bit different, because it supports dual-issue.
    Last edited by ssvb; 08-22-2012 at 02:14 PM.

  8. #28
    Join Date
    Oct 2008
    Posts
    106

    Default

    Quote Originally Posted by ssvb View Post
    It does. VLD2.8 is a NEON instruction.
    It doesn't run through data paths.

  9. #29
    Join Date
    Oct 2008
    Posts
    106

    Default

    Quote Originally Posted by ssvb View Post
    edit: Or do you mean NEON arithmetic instructions? Experiments with an ammeter show that load/store instructions are by far consuming more power, so it's a waste executing anything other than VLD in the NEON unit. Cortex-A8 is a bit different, because it supports dual-issue.
    Yes that's what I meant. I'm surprised by your result given the way VLD and VST work on C-A9. Perhaps dual issuing integer ld/st with NEON DP might require more power?

  10. #30
    Join Date
    Jan 2012
    Posts
    113

    Default

    Quote Originally Posted by ldesnogu View Post
    Yes that's what I meant. I'm surprised by your result given the way VLD and VST work on C-A9.
    It was just a result of a few hours doing non-scientific empirical experiments with an ammeter and trying different types of instructions. One can't optimize code for performance without running benchmarks. Likewise when writing a cpuburn program, having some kind of feedback to estimate the effect of modifying the code is also a must. More background information is available in my blog.

    As for ODROID-X, it's good that they have spotted the power consumption issue in time and added a passive heatsink. Though their claim that software video decoding "is not the normal use environment" sounds like a really poor excuse. A dedicated cpuburn program can heat the CPU a lot more than software video decoding, additionally loading GPU with some heavy shaders and making use of various hardware decoders at the same time could probably cause some really bad problems without proper cooling. But I guess, having the theoretical peak power consumption significantly exceeding the typical power consumption for normal workload is to be expected. The SoC just needs to be safely throttled when it is put into extreme conditions. As I can see, there is some work ongoing on the thermal framework in the arm linux kernel mailing list. Having a passive heatsink probably ensures better chances for the thermal framework to kick in before it is too late

    BTW, the inconsistent Tegra3 performance can be also probably explained by a weird behavior of the frequency scaling governer. Whether the CPU really needs to be throttled or the governor is just misconfigured out of blue, or something else is wrong still needs to be figured out.

    Perhaps dual issuing integer ld/st with NEON DP might require more power?
    Unfortunately integer ld/st instructions can't dual-issue with NEON instructions for Cortex-A9 anymore. Based on my experience, due to a lot of trade-offs Cortex-A9 typically has worse peak performance per cycle for hand optimized code when compared to Cortex-A8. On the other hand, Cortex-A9 fixes some nasty bottlenecks (non-pipelined VFP, slow data read back from NEON, ridiculously small TLB) and has better average performance on poor compiler generated code thanks to OoO. A major L2 cache size boost also clearly helps.
    Last edited by ssvb; 08-22-2012 at 03:45 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •