Announcement

Collapse
No announcement yet.

Quad-Core ODROID-X Battles NVIDIA Tegra 3

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by brent View Post
    Well, Atom often *is* faster than Cortex-A9, mostly because of high clock speed and good single-threaded performance.
    This is typically why I'd like to see such a benchmark. A9 is faster than Atom and its single-thread performance is higher. The only benchmark where I saw Atom faster is external memory bandwidth (and even then it was comparing a desktop Atom vs an A9 smartphone).

    Comment


    • #17
      I noticed there are no compiler switches reported in the benches. Has that something to do with ARM or is it some triviality that slipped passed me?

      EDIT: One possible triviality just occured to me, is it because it is being compared to older runs where compiler switches are not reported?
      Last edited by Del_; 08-22-2012, 09:09 AM.

      Comment


      • #18
        I believe that is the case... also, it does not look like it is posting the architecture info (and dmesg etc) to the OB systems directory. Will see if I can diagnose that as well on another benchmark run.

        We now have the board hooked up to a WattsUp meter and it never jumps over 6W... it was running blogbench at 414% of CPU* at 6W for a second or 2... not stressing the memory simultaneously though so I will need to find a good benchmark for that. Indeed, it is tough to find a benchmark that stresses the whole system simultaneously.

        * it was usually 398% or lower... not sure what to divine from a 414% usage... is there a Cortex-M in there somewhere?

        Comment


        • #19
          Not hard really. Just run stream and dgemm with large datasets simultanously. If you want to include disk, it gets a bit more tricky to do it well I guess. You will find both in HPC Challenge (stream is in PTS too).

          Comment


          • #20
            Originally posted by SolarNet View Post
            We now have the board hooked up to a WattsUp meter and it never jumps over 6W... it was running blogbench at 414% of CPU* at 6W for a second or 2... not stressing the memory simultaneously though so I will need to find a good benchmark for that. Indeed, it is tough to find a benchmark that stresses the whole system simultaneously.
            If you want to push power consumption make sure NEON is being used on the 4 cores. Video encoding/decoding with FFmpeg and 4 threads should do it

            Comment


            • #21
              Interesting to see the ODROID-X running as well as it did. The question I have though is it because of the kernel, compiler and file system that is slowing the Tegra 3 down?

              Comment


              • #22
                Originally posted by ldesnogu View Post
                If you want to push power consumption make sure NEON is being used on the 4 cores. Video encoding/decoding with FFmpeg and 4 threads should do it
                If you want to maximize the power consumption for testing purposes, synthetic cpuburn test programs seem to be a lot more effective than any real workload. Here was my attempt: https://github.com/ssvb/ssvb.github....b-cpuburn-a9.S. Maybe somebody else could do even better.

                Comment


                • #23
                  Originally posted by Darkseider View Post
                  Interesting to see the ODROID-X running as well as it did. The question I have though is it because of the kernel, compiler and file system that is slowing the Tegra 3 down?
                  For the start, Tegra 3 obviously was not running at 1.4GHz clock frequency even for single-threaded workloads (contrary to what is said in the article). I have a dejavu feeling

                  Comment


                  • #24
                    Originally posted by ssvb View Post
                    For the start, Tegra 3 obviously was not running at 1.4GHz clock frequency even for single-threaded workloads (contrary to what is said in the article). I have a dejavu feeling
                    You talking Nvidia here, the reason Linus said fuck you to them is cause they do not want their hardware to work well with the Linux OSS stack, of course its gonna get trashed in the bentmarks. Nvidia is a hardware non grata if you use and love Linux!

                    Comment


                    • #25
                      Originally posted by Rallos Zek View Post
                      You talking Nvidia here, the reason Linus said fuck you to them is cause they do not want their hardware to work well with the Linux OSS stack, of course its gonna get trashed in the bentmarks. Nvidia is a hardware non grata if you use and love Linux!
                      Yes, nvidia doesn't want to run well Android

                      Comment


                      • #26
                        Originally posted by ssvb View Post
                        If you want to maximize the power consumption for testing purposes, synthetic cpuburn test programs seem to be a lot more effective than any real workload. Here was my attempt: https://github.com/ssvb/ssvb.github....b-cpuburn-a9.S. Maybe somebody else could do even better.
                        Your program doesn't use NEON datapath, did you try it?

                        Comment


                        • #27
                          Originally posted by ldesnogu View Post
                          Your program doesn't use NEON datapath, did you try it?
                          It does. VLD2.8 is a NEON instruction.

                          edit: Or do you mean NEON arithmetic instructions? Experiments with an ammeter show that load/store instructions are by far consuming more power, so it's a waste executing anything other than VLD in the NEON unit. Cortex-A8 is a bit different, because it supports dual-issue.
                          Last edited by ssvb; 08-22-2012, 02:14 PM.

                          Comment


                          • #28
                            Originally posted by ssvb View Post
                            It does. VLD2.8 is a NEON instruction.
                            It doesn't run through data paths.

                            Comment


                            • #29
                              Originally posted by ssvb View Post
                              edit: Or do you mean NEON arithmetic instructions? Experiments with an ammeter show that load/store instructions are by far consuming more power, so it's a waste executing anything other than VLD in the NEON unit. Cortex-A8 is a bit different, because it supports dual-issue.
                              Yes that's what I meant. I'm surprised by your result given the way VLD and VST work on C-A9. Perhaps dual issuing integer ld/st with NEON DP might require more power?

                              Comment


                              • #30
                                Originally posted by ldesnogu View Post
                                Yes that's what I meant. I'm surprised by your result given the way VLD and VST work on C-A9.
                                It was just a result of a few hours doing non-scientific empirical experiments with an ammeter and trying different types of instructions. One can't optimize code for performance without running benchmarks. Likewise when writing a cpuburn program, having some kind of feedback to estimate the effect of modifying the code is also a must. More background information is available in my blog.

                                As for ODROID-X, it's good that they have spotted the power consumption issue in time and added a passive heatsink. Though their claim that software video decoding "is not the normal use environment" sounds like a really poor excuse. A dedicated cpuburn program can heat the CPU a lot more than software video decoding, additionally loading GPU with some heavy shaders and making use of various hardware decoders at the same time could probably cause some really bad problems without proper cooling. But I guess, having the theoretical peak power consumption significantly exceeding the typical power consumption for normal workload is to be expected. The SoC just needs to be safely throttled when it is put into extreme conditions. As I can see, there is some work ongoing on the thermal framework in the arm linux kernel mailing list. Having a passive heatsink probably ensures better chances for the thermal framework to kick in before it is too late

                                BTW, the inconsistent Tegra3 performance can be also probably explained by a weird behavior of the frequency scaling governer. Whether the CPU really needs to be throttled or the governor is just misconfigured out of blue, or something else is wrong still needs to be figured out.

                                Perhaps dual issuing integer ld/st with NEON DP might require more power?
                                Unfortunately integer ld/st instructions can't dual-issue with NEON instructions for Cortex-A9 anymore. Based on my experience, due to a lot of trade-offs Cortex-A9 typically has worse peak performance per cycle for hand optimized code when compared to Cortex-A8. On the other hand, Cortex-A9 fixes some nasty bottlenecks (non-pipelined VFP, slow data read back from NEON, ridiculously small TLB) and has better average performance on poor compiler generated code thanks to OoO. A major L2 cache size boost also clearly helps.
                                Last edited by ssvb; 08-22-2012, 03:45 PM.

                                Comment

                                Working...
                                X