Announcement

Collapse
No announcement yet.

Raspberry Pi AI HAT+ Launches: 26 TOPS Accelerator For $110

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Raspberry Pi AI HAT+ Launches: 26 TOPS Accelerator For $110

    Phoronix: Raspberry Pi AI HAT+ Launches: 26 TOPS Accelerator For $110

    Following the launch of the Raspberry Pi AI Kit back during the summer with up to 13 TOPS performance for AI inference, the Raspberry Pi AI HAT+ was announced today with up to 26 TOPS capabilities...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    What this TOPS even is? Ternary (-1, 0, 1), Int4, Int8, fp16, fp32? Whatever you want it to be? It's such a mess and because of marketing overtaking the technical side in this ai craze, this number is mostly meaningless.

    How can we then compare NPUs against eachother? I'd put software stack and its APIs way way above any TOPS number their marketing team publishes. How to measure and evaluate that? Maybe by how much time and skill is required to get popular ai frameworks and models making use of that particular NPU.

    Yes, AI benchmarking and comparison is a pain at this point in time. Thanks largely to efforts of clueless marketing people.

    Comment


    • #3
      "TeraFLOP" but for single-byte operations. Obviously TPUs are better than GPUs for calculating ANN stuff, because naive DSP is easier than arbitrary shading -- TPUs are just cheaper and eat less power. Porting is also easy, the hard part (apart from closed-sourceness) is to balance computations with memory transactions, we are well beyond point where arithmetic become practically free in comparison to moving data around.

      Comment


      • #4
        Raspberry Pi gets quite expensive if you want to add SSD and AI. I don't even know if you can add both at the same time or if you can only pick one. I think it is better just to buy a cheap Intel or AMD board.

        I think Raspberry Pi Foundation should create a Raspberry Pi with built-on on-board AI.

        Comment


        • #5
          Originally posted by mb_q View Post
          "TeraFLOP" but for single-byte operations.
          No, there's no F in TOPS. That's why it's so free in interpretation and you really need to pay attention to the fine print (if there is any). It's not like top500 HPL where F is clearly defined as 64bit double precision floating point, it's pretty much implementation dependent.

          Comment


          • #6
            If anyone else was confused about what the HAT+ spec is, it's Raspberry Pi's "Hardware-Attached-on-Top". Not some fancy new AI standard as I first thought.

            Comment


            • #7
              Originally posted by pegasus View Post
              What this TOPS even is? Ternary (-1, 0, 1), Int4, Int8, fp16, fp32? Whatever you want it to be? It's such a mess and because of marketing overtaking the technical side in this ai craze, this number is mostly meaningless.
              There are plenty of caveats, but INT8 is usually the comparison if it's unspecified. But the number could be useless even if technically correct for the reasons you listed and more.

              Comment


              • #8
                I don't know why everyone's jumping on the AI bandwagon so hard when its usefulness is very debatable, and it has a large impact on the planet.

                Comment


                • #9
                  Originally posted by pegasus View Post
                  No, there's no F in TOPS. That's why it's so free in interpretation and you really need to pay attention to the fine print (if there is any). It's not like top500 HPL where F is clearly defined as 64bit double precision floating point, it's pretty much implementation dependent.
                  So I wrote; the problem is that FLOPS are not set in stone either. The original idea is that this is just the maximal, theoretical throughput of a machine, so that I could say for sure that I won't beat a 10% efficient algorithm on a 100 GFLOP machine using a 9 GFLOP machine. Obviously there are some subtle problems with that, like that current Intel's FLOP rating is based on AVX512, which cannot really run for long without LN cooling, etc., but it is roughly useful this way.

                  The key problem is that such FLOP is easy to cheat: you can just start piling calculators and claim this will get you to arbitrary FLOPS with linear effort. Real problems are currently usually memory/network bound, though, so it is an useless approach. That's why, for instance, TOP500 uses FLOP defined in terms of LINPACK bench executions per second.

                  However, not everybody want to just exclusively run LINPACK bench, which leads to a conundrum that computer performance is effectively unmeasurable because it is so dependable on the mixture of use-case, operational context and software and hardware micro-optimisations.

                  Comment


                  • #10
                    better performance and cooling than the Raspberry Pi AK Kit interfacing via the M.2 connector.
                    Y'all think someone has made an actual kit for an AK to get the wind speed, distance, and adjust the sites in the scope accordingly? I suppose that's technically a kit for a scope...

                    Michael Just reporting what I think is a rather funny typo.

                    Comment

                    Working...
                    X