Announcement

Collapse
No announcement yet.

The NVIDIA Jetson TX2 Performance Has Evolved Nicely Since Launch

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by coder View Post
    Why? Just get a Gemini Lake board.

    For about $120, you get probably comparable CPU performance and a well-supported GPU (with open source drivers) that's at least half what the TX2 packs. Power consumption is comparable, but Gemini Lake is available in standard form factors.

    ASRock Super AlloyIntel Quad-Core Pentium Processor J5005 (up to 2.8 GHz)Supports DDR4 2133/2400 SO-DIMM1 PCIe 2.0 x1, 1 M.2 (Key E)Graphics Output Options: D-Sub, HDMI, DVI-D7.1 CH HD Audio (Realtek ALC892 Audio Codec), ELNA Audio Caps4 SATA34 USB 3.1 Gen1 (2 Front, 2 Rear)Supports Full Spike Protection, ASRock Live Update & APP Shop


    That particular board is passively-cooled and supports HDMI 2.0.
    Definitely not a bad alternative and I appreciate you pointing that out. But if I'm going to go with a standard form factor, I might as well go for a socketed CPU. What I need this for is a robot, so the form factor doesn't really matter that much (as long as it's small) and neither do most of the connectors related to desktop usage (including HDMI 2.0, surround sound audio, plenty of USB ports, etc).
    Best of all, it supports OpenCL (which Tegra SoC's do not)!
    I actually wasn't aware the Tegras didn't support OpenCL. That's a shame. However, for the time being I've been [begrudgingly] using CUDA, since there are more resources available for it that do what I need. I'd strongly prefer OpenCL but I'd have to do a lot of code from scratch, which would be a hefty investment of time and effort on my part for a hobbyist project. All that being said, I'm also using OpenCV with the T-API, which by default uses OpenCL, but, I think there's a build of OpenCV specific to Tegra users that can use CUDA instead.

    Comment


    • #22
      Originally posted by coder View Post
      Definitely not, but I think Xavier might. Their "Drive PX Pegasus" platform links two of them together, somehow. The presence of NVLink is mentioned, here (though without details like the # of links):

      https://www.anandtech.com/show/11913...t-nextgen-gpus
      To bad. Since the Pascal architecture definitely has them.
      It's a shame that NVidia spends a bunch on ASIC real estate and BGA balls and there is nothing to show for, for the general public that is.
      No NVLink core or IP-block anywhere in sight.
      I'd love a NVidia lab-board with a 6x NVLink to a fast FPGA to see what that setup could do.150G bidi bw with NVLink 2.0.

      Comment


      • #23
        Originally posted by ldesnogu View Post
        That's not what Girolamo_Cavazzoni was talking about I guess: Denver is using a JIT that improves performance as the benchmark runs by recompiling hot spots on the fly, that's much more dynamic than profile-driven compilation where, as you say, you run the program twice and you're done. OTOH I'm not sure the JIT engine of Denver will improve performance of a program when it's run multiple times.

        Another thing to take care of when benchmarking TX2 is to make sure of where a program is running: the Denver core or the A57 core. When the board boots the Denver cores are disabled and have to be explicitly enabled. The nvpmodel tool can be used to enable either or both clusters.
        It's fancy pancy speak for ISA frontend translated to a internal architecture, which all modern "large" CPU's are today anyway.
        You can extend the translation a bit, especially if your backend is really, really wide.
        But in general I don't expect much performance benefit compared to a more traditional way.
        There are other benefits though. For example: It's easier to hide stupid binary compilation speed issues when moving between CPU's.
        It's easier to make old code benefit from a newer CPU.

        Comment


        • #24
          Originally posted by milkylainen View Post

          It's fancy pancy speak for ISA frontend translated to a internal architecture, which all modern "large" CPU's are today anyway.
          You can extend the translation a bit, especially if your backend is really, really wide.
          But in general I don't expect much performance benefit compared to a more traditional way.
          There are other benefits though. For example: It's easier to hide stupid binary compilation speed issues when moving between CPU's.
          It's easier to make old code benefit from a newer CPU.
          You definitely should read about how Denver works before claiming it's "fancy pancy speak". Here is a starting point: https://www.anandtech.com/show/8701/...xus-9-review/4
          It goes much farther than what your typical CPU does in HW.
          Last edited by ldesnogu; 31 August 2018, 01:28 AM. Reason: Fix link.

          Comment


          • #25
            Originally posted by ldesnogu View Post
            You definitely should read about how Denver works before claiming it's "fancy pancy speak". Here is a starting point: https://www.anandtech.com/show/8701/...e.php?id=11262
            It goes much farther than what your typical CPU does in HW.
            I did and I stand by my opinion.
            Yes. It is taking code translation to internal ISA a bit further.
            But market speak makes you believe it will do Ludicrous speed with code optimization shenanigans.

            All modern and large CPU's marry a really wide/deep backend with a industry standard ISA.
            While this is taking it a bit further it is no magic sauce.
            Keeping optimized micro/macro-op translations cached is not a new idea.
            You're trading a lot of silicon for smartness. You could spend that silicon on a beefier front end or a wider mem interface, etc.
            Unless the smartness results in drastic compexity reduction for the same speedup gains, it's usually not worth it (tiled rendering, rasterization for example).
            Complexity reduction could be reordering done in software, etc.

            The discussion for brain vs brawn has been going on for decades.
            It's usually been universal that brawn is the simpler and more generic tradeoff.
            Easier to implement, verify etc.
            Transmeta did part of this already (part of the team came from Transmeta). They failed miserably.
            Their CPU wasn't faster than a contemporary CPU that spent as much silicon on pure brawn.
            In the end the customer won't care much for whatever brains if the $$ does not buy enough speed.
            Denver could easily do x86 translation from the frontend aswell if Nvidia wanted an x86 CPU.

            Also, it's not like the cache will hold translations for an entire benchmark that is run a gazillion times to "optimize".
            It will most likely hold a couple of tight kernel loops that are often used frequently.
            As I said. There are advantages and disadvantages.

            Comment


            • #26
              Thanks for the course on CPU micro-architecture

              Comment


              • #27
                Originally posted by coder View Post
                Definitely not, but I think Xavier might. Their "Drive PX Pegasus" platform links two of them together, somehow. The presence of NVLink is mentioned, here (though without details like the # of links):

                https://www.anandtech.com/show/11913...t-nextgen-gpus
                Interesting but it's lacking in specifics. The dedicated GPUs could have NVLink between them.

                Comment


                • #28
                  Originally posted by ldesnogu View Post
                  Consecutive runs shouldn't change anything unless the JIT maintains a DB of hot spots (as far as I know only FX!32 did that). All optimizations are done on the fly,
                  That would be lame. They don't need to keep profiling data, but should at least keep a persistent cache of JIT-translated/optimized images. Otherwise, load times and power utilization would suffer, both of which would be quite counterproductive to most of this chip's goals.

                  Comment


                  • #29
                    Originally posted by schmidtbag View Post
                    Definitely not a bad alternative and I appreciate you pointing that out. But if I'm going to go with a standard form factor, I might as well go for a socketed CPU. What I need this for is a robot, so the form factor doesn't really matter that much (as long as it's small) and neither do most of the connectors related to desktop usage (including HDMI 2.0, surround sound audio, plenty of USB ports, etc).
                    You seen this?



                    Here's an Apollo Lake SoC on an embeddable board:

                    Industrial use and capable of enabling next-generation industrial automation and AI solutions. Wide range of AI Acceleration modules in mPCIe, M.2 2280 and PCIe[x4] form factorPowerful, Industrial, AI-Proof and expendable to scale upPre-installed software package includes Ubuntu and Intel Edge Insi


                    Originally posted by schmidtbag View Post
                    I actually wasn't aware the Tegras didn't support OpenCL. That's a shame. However, for the time being I've been [begrudgingly] using CUDA, since there are more resources available for it that do what I need. I'd strongly prefer OpenCL but I'd have to do a lot of code from scratch, which would be a hefty investment of time and effort on my part for a hobbyist project. All that being said, I'm also using OpenCV with the T-API, which by default uses OpenCL, but, I think there's a build of OpenCV specific to Tegra users that can use CUDA instead.
                    Will it use CUDA, automatically? I thought you had to explicitly use stuff in the cuda namespace.

                    Comment


                    • #30
                      Originally posted by ldesnogu View Post
                      You definitely should read about how Denver works before claiming it's "fancy pancy speak". Here is a starting point: https://www.anandtech.com/show/8701/...e.php?id=11262
                      It goes much farther than what your typical CPU does in HW.
                      True, but that deals with the original Denver, from 2014. This has Denver 2, described here (note that Parker is the code name for TX2):



                      Anyway, your link didn't work for me. Try this:

                      https://www.anandtech.com/show/8701/...xus-9-review/2
                      Last edited by coder; 30 August 2018, 11:47 PM.

                      Comment

                      Working...
                      X