Announcement

Collapse
No announcement yet.

GFX1013 Target Added To LLVM 13.0 For RDNA2 APUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Alexmitter View Post

    And if GPU computing has to be performant, LLVM will eventually be replaced here too.

    There is really no need for that low level virtual machine and its slowness, unreliability and unpredictability in any use case.
    I personally agree, considering the proprietary driver is able to use its own thing instead of LLVM.

    Comment


    • #12
      Originally posted by hoohoo View Post

      Two points:
      - LLVM compilers have nothing to do with virtual machines per se.
      Yes sure, the compiler itself has nothing to do with a VM, the VM is part of the final "binary" beside the generated pseudo byte-code that the VM will then execute at runtime. Untrustable, unreliable, slow.

      Unusable for serious business.

      Comment


      • #13
        Originally posted by StillStuckOnSI View Post

        Yeah, the VM in LLVM should be mostly ignored as a curiosity. It's an IR manipulation and code generation framework first and foremost.
        I would not call the VM in LLVM a curiosity. After all its generated pseudo bytecode will be executed by this VM, its the main (anti) feature of LLVM.

        I would agree that it maybe is okay enough for such compute on GPGPU taskes, but that is about where the line of its okay to use llvm ends.

        For GPGPU, a purpose made compiler will always be faster.
        For normal compile tasks, a proper compile will always be better, pseudo bytecode and a low level virtual machine are not acceptable where reliability, performance or any sane reasoning are important.

        Comment


        • #14
          Originally posted by Alexmitter View Post
          I would not call the VM in LLVM a curiosity. After all its generated pseudo bytecode will be executed by this VM, its the main (anti) feature of LLVM.

          I would agree that it maybe is okay enough for such compute on GPGPU taskes, but that is about where the line of its okay to use llvm ends.

          For GPGPU, a purpose made compiler will always be faster.
          For normal compile tasks, a proper compile will always be better, pseudo bytecode and a low level virtual machine are not acceptable where reliability, performance or any sane reasoning are important.
          I may be misunderstanding, but are you suggesting that shaders compiled with LLVM are produced in some kind of bytecode that is then interpreted at runtime ?

          EDIT - it occurred to me that this would also imply that the bytecode interpreter would have to run on the shader core (with a thread for every simultaneously processed pixel/vertex) which while not impossible would certainly be unusual.

          If so then I have to disagree. It is *possible* for clang/llvm to generate bytecode but the most common use is to generate native ISA, whether that be x86, ARM or GPU shader code. That is what we do in all of the open source drivers.
          Last edited by bridgman; 21 June 2021, 10:22 AM.
          Test signature

          Comment


          • #15
            Originally posted by Alexmitter View Post
            VM is part of the final "binary" beside the generated pseudo byte-code that the VM will then execute at runtime.
            That is incorrect.

            Comment


            • #16
              Originally posted by Alexmitter View Post

              Yes sure, the compiler itself has nothing to do with a VM, the VM is part of the final "binary" beside the generated pseudo byte-code that the VM will then execute at runtime. <snip>.
              I don't think so. At least as of several years ago the LLVM compilers generated machine code or an intermediate representation. The VM in LLVM is there because the greater LLVM project did have VM frameworks, but also compilers. The compilers were the most successful (IMO) product of the LLVM project. And the LLVM compilers did not generate VMs, they were just compilers. The compilers proved to be more modular and thus more easily extended than GCC. That modularity was so valuable that even though LLVM C or C++ machine code was until quite recently slower than GCC code but GCC was so painful to add a new language or code gen to that people spent years improving LLVM compiler code generators.

              You look at what ROCm is doing when you run a compute kernel: it invokes LLVM to generate GPU-specific instructions. You can see it doing it over and over and over and over when you use something like Tensorflow or Caffe and you run a new or modified model. That's not a VM style scheme: you can see the arch-specific object files under (IIRC) ~/.cache/rocm*

              Comment


              • #17
                If I were to believe all the very harshly worded and confidently presented claims by Alexmitter, that would mean that both Intel and AMD have no competent developers who can see these "obvious" downsides of LLVM (which are so obvious that some random dude on a forum can identify them immediately) as both companies make extensive use of LLVM in their graphics stacks. That sounds somewhat unlikely to me. Also, he does not seem to be entirely clear when it comes to what LLVM compilers actually do...

                Comment


                • #18
                  Originally posted by GruenSein View Post
                  If I were to believe all the very harshly worded and confidently presented claims by Alexmitter
                  ..... some random dude on a forum ...
                  Alexmitter is that kind of random dude sitting on his sofa, while he is harshly criticising all soccer players and coaches during the European cup. Of course he would win any soccer match alone, and that's the reason way he is just sitting on the sofa. Because it would be a bit unfair for the other teams if he became European Champion as a one-man-team, you know?

                  Comment


                  • #19
                    the "LLVM compiler " is not used to generate VM bytecode that is interpreted on some VM on the target ISA.. The idea here is a completly different:

                    LLVM has Frontends and Backends and an intermediate "IR" ISA neutral bytecode representation.
                    Typically you would write program code in the language of your choice, generate IR from it, run optimizations on that IR and then generate ISA specific instructions from that IR running further ISA specific optimizations in the process..

                    The "IR" is only used as an intermediate representation here, that way you don´t have to write like "C -> x86, C -> arm64, fortran -> x86, fortran -> arm64" compilers, but you only write
                    C -> IR, Fortran -> IR frontends
                    and
                    IR -> x86
                    IR -> arm64
                    Backends..

                    You could furthermore reuse all optimizations run on the IR step..
                    It should be obvious that this saves a lot of work with every ISA and programming language added..

                    It´s possible to run IR in an interpreter / vm, but it is very very rarely done.

                    B.t.w.: It´s good news, that GFX1013 ISA is added to LLVM, as this means that we will probably see RDNA (gfx1010) / RDNA2 (gfx1013) support in ROCm pretty soon.

                    B.t.w. ROCm uses hand optimized assembly code for the specific ISA / card for a lot of functions like BLAS and machine learning like the winograd convolution kernel, as compilers don´t deliver the fastest code possible
                    Last edited by Spacefish; 21 June 2021, 08:00 PM.

                    Comment


                    • #20
                      Originally posted by Spacefish View Post
                      B.t.w. ROCm uses hand optimized assembly code for the specific ISA / card for a lot of functions like BLAS and machine learning like the winograd convolution kernel, as compilers don´t deliver the fastest code possible
                      Yep... our newer CDNA parts also include matrix operations which don't lend themselves to being compiler targets - so the libraries hard-code those instructions.
                      Test signature

                      Comment

                      Working...
                      X