Announcement

Collapse
No announcement yet.

AMD's Newest Open-Source Surprise: "Peano" - An LLVM Compiler For Ryzen AI NPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    As Stephen says on the llvm forum post, there is quite a bit of software involved in delivering a full user land for ML workflows -- and as another commenter pointed out, "We need every college kid to not only have access to but to also be able to easily program the new hardware". That is hard to do with just high level or opaque interfaces, which is why you are seeing the bottom up release of software needed to program it at any level. While it seems likely that many folks will use the higher level MLIR tools or ML framework integrations to program these devices, having all of the software available that is needed to program/debug/understand existing or novel cases is what makes it important to get the driver and low level LLVM toolchain well established as OSS software -- since everything builds on that.

    Comment


    • #12
      "Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors."

      years ago they told us all this is death... "in-order" is death and they also told us that VLIW Processors are death.

      now surprise surprise its not death.
      Phantom circuit Sequence Reducer Dyslexia

      Comment


      • #13
        Originally posted by qarium View Post
        "Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors."

        years ago they told us all this is death... "in-order" is death and they also told us that VLIW Processors are death.

        now surprise surprise its not death.
        Depends on the environment... if you are in control of memory and cache residency then in-order can be very efficient, and most GPUs are still in-order today.

        For VLIW it's a function of how much you can control the workload. VLIW worked pretty well for graphics in our GPUs, and it was really the emerging use of compute both in graphics workloads and compute workloads that prompted us to move away from it.

        AI is arguably one of the most controlled workloads in the sense that the vast majority of processing happens in library code, and there is a trend towards treating AI processing as a streaming workload where caches are downplayed in favour of embedded memory or wide/burst fetches from main memory.

        For general purpose workloads it's probably still fair to say that in-order and VLIW have serious performance penalties.
        Test signature

        Comment


        • #14
          Originally posted by bridgman View Post
          Depends on the environment... if you are in control of memory and cache residency then in-order can be very efficient, and most GPUs are still in-order today.
          For VLIW it's a function of how much you can control the workload. VLIW worked pretty well for graphics in our GPUs, and it was really the emerging use of compute both in graphics workloads and compute workloads that prompted us to move away from it.
          AI is arguably one of the most controlled workloads in the sense that the vast majority of processing happens in library code, and there is a trend towards treating AI processing as a streaming workload where caches are downplayed in favour of embedded memory or wide/burst fetches from main memory.
          For general purpose workloads it's probably still fair to say that in-order and VLIW have serious performance penalties.
          right thank you for your text.

          as i already said to you in another topic i expect that RDNA5 will have out-of-order shader units together with Shader Execution Reordering (SER)​​.
          because GPUs will become more and more turn into general purpose compute workloads devices.
          and AI workloads will more and more immigrate to NPUs based on in-order and VLIW...

          FSR3 is done in shaders right now but i am pretty sure that this part will move to a AI-NPU engine like they nvidia does it with DLSS...

          its pretty sure that we will graphic engines for games based on AI/NPU instead of raster or raytracing.
          Phantom circuit Sequence Reducer Dyslexia

          Comment


          • #15
            Originally posted by qarium View Post
            "Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors."

            years ago they told us all this is death... "in-order" is death and they also told us that VLIW Processors are death.

            now surprise surprise its not death.
            This is how GPUs have worked since forever. This thing is basically a stripped-down GPU with all the shit for vertex/fragment/etc operations stripped out.

            Comment


            • #16
              AMD doing more dumb bullshit. Great that they're making NPUs or whatever, but it looks like they're repeating the same mistakes that make ROCm/HIP suck.

              Right now any application targeting HIP has to target the architecture of a specific card. The binaries aren't portable across architectures. (unlike nvidia's PTX portable bytecode) This is the reason there's hyper-constrained support for only a small number of cards, and constant crashes and bugs and churn everywhere else.

              Now it sounds like they're exposing the inner working of the VLIW pipelines of their NPUs to the compiler and resulting binaries, which means if they don't actually ship this compiler to everyone with a chipset then applications are going to have to choose and ship binaries (and do testing) for all the chipsets they want to support. It's a recipe for disaster.

              Why can't AMD learn their fucking lesson and ship some intermediate representation!? We already do it for regular GPU workloads and it would make everyone's lives much easier. Hell, if you do it with normal shading IRs then what you end up with is either openCL 3.0 or vulkan compute.

              Comment


              • #17
                Originally posted by Developer12 View Post
                AMD doing more dumb bullshit. Great that they're making NPUs or whatever, but it looks like they're repeating the same mistakes that make ROCm/HIP suck.
                Right now any application targeting HIP has to target the architecture of a specific card. The binaries aren't portable across architectures. (unlike nvidia's PTX portable bytecode) This is the reason there's hyper-constrained support for only a small number of cards, and constant crashes and bugs and churn everywhere else.

                my interpretation of ROCm/HIP is that it is just a legal hack to hack the law system and you can ignor all the HIP spezific hacks if you just use ZLUDA...

                i think there is a software patent on: "nvidia's PTX portable bytecode" this is the reason why they can not use it.

                now my question for you why is AMD dumb if they just avoid a software patent ? in 20 years of the patent is over they can implement it this way.

                Originally posted by Developer12 View Post
                Now it sounds like they're exposing the inner working of the VLIW pipelines of their NPUs to the compiler and resulting binaries, which means if they don't actually ship this compiler to everyone with a chipset then applications are going to have to choose and ship binaries (and do testing) for all the chipsets they want to support. It's a recipe for disaster.
                Why can't AMD learn their fucking lesson and ship some intermediate representation!? We already do it for regular GPU workloads and it would make everyone's lives much easier. Hell, if you do it with normal shading IRs then what you end up with is either openCL 3.0 or vulkan compute.
                the best idea "nvidia's PTX portable bytecode" is worthless if there is a software patent on it.

                now tell me why is this not a valid solution: "if they don't actually ship this compiler to everyone with a chipset"

                to me this sounds a valid solution because on linux the LLVM compiler is already everywhere.

                you still do not unterstand why Nvidia could successfully sapotage OpenCL 3.0 but Nvidia can not sapotage Vulkan compute... just create a game or new proton version who use this vulkan extension and Nvidia is forced to support this extension.
                Phantom circuit Sequence Reducer Dyslexia

                Comment


                • #18
                  Originally posted by qarium View Post

                  my interpretation of ROCm/HIP is that it is just a legal hack to hack the law system and you can ignor all the HIP spezific hacks if you just use ZLUDA...

                  i think there is a software patent on: "nvidia's PTX portable bytecode" this is the reason why they can not use it.

                  now my question for you why is AMD dumb if they just avoid a software patent ? in 20 years of the patent is over they can implement it this way.

                  the best idea "nvidia's PTX portable bytecode" is worthless if there is a software patent on it.
                  There's no patent on the whole concept of using a portable representation. Hell, all they had to do was copy OpenCL and compile HIP C++ code down to a common shader language.

                  Originally posted by qarium View Post
                  now tell me why is this not a valid solution: "if they don't actually ship this compiler to everyone with a chipset"

                  to me this sounds a valid solution because on linux the LLVM compiler is already everywhere.

                  you still do not unterstand why Nvidia could successfully sapotage OpenCL 3.0 but Nvidia can not sapotage Vulkan compute... just create a game or new proton version who use this vulkan extension and Nvidia is forced to support this extension.
                  Shipping a compiler the size of LLVM is certainly a solution, that's why I suggested it. But it's definitely a dumb solution for two reasons:

                  1) LLVM is big and heavy compared to the compilers for shading languages that exist in every driver already. A lot of more minimal installs don't tend to carry a lot of dev tools like LLVM anyway.

                  2) Waiting until the application is on the user's computer before compiling it using LLVM introduces a ton of headaches. First, it means sharing the source code with clients, which not every company will be willing to do just to support AMD cards. Second, even if we talk only about open source, there's a reason why binary packages exist. Nobody except gentoo users want to subject themselves to a massive compile job just to use some software. It also ends up dragging in whatever build tools happen to be on the user's computer, and that can be a nightmare to support across all the different distros and whatever special shit a user might have installed.

                  I'm still not convinced you even know what "vulkan compute" means. Maybe you need to be on some stronger meds. Should help with all these conspiracy theories.

                  Comment


                  • #19
                    Originally posted by qarium View Post

                    my interpretation of ROCm/HIP is that it is just a legal hack to hack the law system and you can ignor all the HIP spezific hacks if you just use ZLUDA...

                    i think there is a software patent on: "nvidia's PTX portable bytecode" this is the reason why they can not use it.

                    now my question for you why is AMD dumb if they just avoid a software patent ? in 20 years of the patent is over they can implement it this way.



                    the best idea "nvidia's PTX portable bytecode" is worthless if there is a software patent on it.

                    now tell me why is this not a valid solution: "if they don't actually ship this compiler to everyone with a chipset"

                    to me this sounds a valid solution because on linux the LLVM compiler is already everywhere.

                    you still do not unterstand why Nvidia could successfully sapotage OpenCL 3.0 but Nvidia can not sapotage Vulkan compute... just create a game or new proton version who use this vulkan extension and Nvidia is forced to support this extension.
                    Zluda doesn't work with ROCm 6.x - a lot of other software also isn't compatible - like usual, AMD does nothing and just gives up support. They don't care about anything but gaming - and AI - to some extent except they're way behind on that, too.

                    Comment


                    • #20
                      Originally posted by Panix View Post
                      Zluda doesn't work with ROCm 6.x - a lot of other software also isn't compatible - like usual, AMD does nothing and just gives up support. They don't care about anything but gaming - and AI - to some extent except they're way behind on that, too.
                      Zluda git work is already on the way for ROCm 6.x support.

                      Phantom circuit Sequence Reducer Dyslexia

                      Comment

                      Working...
                      X