Announcement

Collapse
No announcement yet.

Quad-Core ODROID-X Battles NVIDIA Tegra 3

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Interesting to see the ODROID-X running as well as it did. The question I have though is it because of the kernel, compiler and file system that is slowing the Tegra 3 down?

    Comment


    • #22
      Originally posted by ldesnogu View Post
      If you want to push power consumption make sure NEON is being used on the 4 cores. Video encoding/decoding with FFmpeg and 4 threads should do it
      If you want to maximize the power consumption for testing purposes, synthetic cpuburn test programs seem to be a lot more effective than any real workload. Here was my attempt: https://github.com/ssvb/ssvb.github....b-cpuburn-a9.S. Maybe somebody else could do even better.

      Comment


      • #23
        Originally posted by Darkseider View Post
        Interesting to see the ODROID-X running as well as it did. The question I have though is it because of the kernel, compiler and file system that is slowing the Tegra 3 down?
        For the start, Tegra 3 obviously was not running at 1.4GHz clock frequency even for single-threaded workloads (contrary to what is said in the article). I have a dejavu feeling

        Comment


        • #24
          Originally posted by ssvb View Post
          For the start, Tegra 3 obviously was not running at 1.4GHz clock frequency even for single-threaded workloads (contrary to what is said in the article). I have a dejavu feeling
          You talking Nvidia here, the reason Linus said fuck you to them is cause they do not want their hardware to work well with the Linux OSS stack, of course its gonna get trashed in the bentmarks. Nvidia is a hardware non grata if you use and love Linux!

          Comment


          • #25
            Originally posted by Rallos Zek View Post
            You talking Nvidia here, the reason Linus said fuck you to them is cause they do not want their hardware to work well with the Linux OSS stack, of course its gonna get trashed in the bentmarks. Nvidia is a hardware non grata if you use and love Linux!
            Yes, nvidia doesn't want to run well Android

            Comment


            • #26
              Originally posted by ssvb View Post
              If you want to maximize the power consumption for testing purposes, synthetic cpuburn test programs seem to be a lot more effective than any real workload. Here was my attempt: https://github.com/ssvb/ssvb.github....b-cpuburn-a9.S. Maybe somebody else could do even better.
              Your program doesn't use NEON datapath, did you try it?

              Comment


              • #27
                Originally posted by ldesnogu View Post
                Your program doesn't use NEON datapath, did you try it?
                It does. VLD2.8 is a NEON instruction.

                edit: Or do you mean NEON arithmetic instructions? Experiments with an ammeter show that load/store instructions are by far consuming more power, so it's a waste executing anything other than VLD in the NEON unit. Cortex-A8 is a bit different, because it supports dual-issue.
                Last edited by ssvb; 22 August 2012, 02:14 PM.

                Comment


                • #28
                  Originally posted by ssvb View Post
                  It does. VLD2.8 is a NEON instruction.
                  It doesn't run through data paths.

                  Comment


                  • #29
                    Originally posted by ssvb View Post
                    edit: Or do you mean NEON arithmetic instructions? Experiments with an ammeter show that load/store instructions are by far consuming more power, so it's a waste executing anything other than VLD in the NEON unit. Cortex-A8 is a bit different, because it supports dual-issue.
                    Yes that's what I meant. I'm surprised by your result given the way VLD and VST work on C-A9. Perhaps dual issuing integer ld/st with NEON DP might require more power?

                    Comment


                    • #30
                      Originally posted by ldesnogu View Post
                      Yes that's what I meant. I'm surprised by your result given the way VLD and VST work on C-A9.
                      It was just a result of a few hours doing non-scientific empirical experiments with an ammeter and trying different types of instructions. One can't optimize code for performance without running benchmarks. Likewise when writing a cpuburn program, having some kind of feedback to estimate the effect of modifying the code is also a must. More background information is available in my blog.

                      As for ODROID-X, it's good that they have spotted the power consumption issue in time and added a passive heatsink. Though their claim that software video decoding "is not the normal use environment" sounds like a really poor excuse. A dedicated cpuburn program can heat the CPU a lot more than software video decoding, additionally loading GPU with some heavy shaders and making use of various hardware decoders at the same time could probably cause some really bad problems without proper cooling. But I guess, having the theoretical peak power consumption significantly exceeding the typical power consumption for normal workload is to be expected. The SoC just needs to be safely throttled when it is put into extreme conditions. As I can see, there is some work ongoing on the thermal framework in the arm linux kernel mailing list. Having a passive heatsink probably ensures better chances for the thermal framework to kick in before it is too late

                      BTW, the inconsistent Tegra3 performance can be also probably explained by a weird behavior of the frequency scaling governer. Whether the CPU really needs to be throttled or the governor is just misconfigured out of blue, or something else is wrong still needs to be figured out.

                      Perhaps dual issuing integer ld/st with NEON DP might require more power?
                      Unfortunately integer ld/st instructions can't dual-issue with NEON instructions for Cortex-A9 anymore. Based on my experience, due to a lot of trade-offs Cortex-A9 typically has worse peak performance per cycle for hand optimized code when compared to Cortex-A8. On the other hand, Cortex-A9 fixes some nasty bottlenecks (non-pipelined VFP, slow data read back from NEON, ridiculously small TLB) and has better average performance on poor compiler generated code thanks to OoO. A major L2 cache size boost also clearly helps.
                      Last edited by ssvb; 22 August 2012, 03:45 PM.

                      Comment

                      Working...
                      X