Announcement

Collapse
No announcement yet.

AMD Proposing Redesign For How Linux GPU Drivers Work - Explicit Fences Everywhere

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by smitty3268 View Post

    It's ongoing. I suggest reading the discussion on the mailing list if you are interested.

    If not, the TLDR is that this whole thing is a giant mess and extremely complicated, and will likely take a lot of discussion to figure out and then years to port over all the different components involved. Parts of Marek's original proposal seem non-controversial, but other parts have been deemed significantly misguided, at least by parts of the community. Everyone seems to think "something" needs to be done, though, it's just a matter of figuring out what and how to do it.
    what i note is that there are possible workarounds able to fix issues. So there are issues to fix in order to optimize AMD graphical hardware on linux. i see a good perspective for AMD users.

    Comment


    • #32
      Originally posted by gigi View Post
      i would first ask, did AMD done enough homework for recently released zen 3 processors in linux kernel? does latest kernel atleast recognise zen 3 processors by default?
      Lol, we should stop being so ungrateful. So what if they don't support temperature, current, voltage and power readings? Just chuck it into a computer and don't worry about looking at any of the things.

      We should just keep our mouths shut and not say anything bad about our saviour AMD. We should be grateful that their Linux support has been horrible for decades.

      (According to AMD fanboys)

      Comment


      • #33
        Originally posted by marios View Post

        About the security implications, it is partially true. But they can be solved. On the other hand, I would expect massive performance gains and not performance hit. How can something that does almost everything in user-space be slower than something that ping-pongs the kernel?

        The reason that this is not easy, is that it needs more HW/SW co-design. What makes things even tougher is the need to support the shitty platform called windows as well.
        Baring hardware/silicon for acceleration, whenever you go from userspace to kernel and vice versa it requires a context switch which hits performance. For some things its not a massive deal, but for something like graphics cards its a massive deal.

        Note that in Windows XP the graphics driver used to sit in userspace and they moved it out because of performance reasons.

        Comment


        • #34
          Originally posted by jrch2k8 View Post

          Network cards are extremely simple(compared to a GPU) and a lot easier to handle in this scenario but a GPU will not work because the GPU is bound basically by every hardware operation (memory access, CPU, I/O, BUS, cache coherency, etc.) and any extra latency will trigger cache flush and context switch delays on the GPU so on and so forth.

          Also you seem to have a huge misunderstanding on the software side here, so few thing:

          1.) Userspace is at execution ring 3, it is slow and don't have direct hardware access.
          2.) Kernel is at execution ring 0, it is really fast and do have direct hardware access.
          3.) Calling kernel operations from userspace is not that slow and is basically the way the hardware is designed to work(see **).
          4.) They are not proposing this change because userspace switch are that expensive but because the current sync primitive in use is way too overkill for their needs and the extra step for syncs introduce a bit of unneeded latency which makes sense since those primitive were designed primarily for CPUs operations not modern GPUs sync.
          5.) OpenFabrics is viable in theory since NPU basically handle I/O buffers with DMA and they simply want to bypass kernel verification by writing directly to a page, personally i don't love the idea and believe the FreeBSD approach is way better which is the NPU write and read directly into kernel buffers and pass the file descriptors to userspace(boy is fast btw)
          6.) On the other hand unlike 5.) something as stupid as drawing a triangle on a GPU could trigger a couple thousand context switches, a metric ton of I/O operations, huge amount of cache operations, huge DMA reads/write and subsequent PCIe sync operations trying to upload/download data for sync, etc, etc. it basically turn all your hardware into a Christmas tree. To allow kernel bypass here will mean having to write a pseudo kernel just to handle sync.

          ** Calling kernel code from userspace is basically very very fast and a lot faster than using userspace equivalents (is basically hardware acceleration) BUT is a double edge sword if you don't fully understand what you are doing because not all primitives are fast for every scenario/task and this is the most common mistake developers make when trying to use syscalls directly which has given this notion that going to the kernel is slow.
          Thanks for the text, I will be using it as the new copy-pasta for https://www.reddit.com/r/confidentlyincorrect/

          1. It is wrong. Userspace (ring3 at x86) is as fast as the cpu allows. And it does have direct hardware access if a driver implements an appropriate mmap.
          2. It is wrong. Kernel space (ring0 at x86) is as fast as the cpu allows. It has the same frequency and IPC as userspace, so it is as fast as userspace. It just has more privileges.
          3. It is correct. But it is slower than avoiding it, so it is suboptimal. The fact that it is the way hardware is designed to work is the reason I said something about HW/SW co-design. Changes in the hardware are definitely needed.
          4. I don't get it, maybe my English is too bad...
          5. The claim about the freeBSD way being better and fast is disputed.
          6. If you need so much sync among processes, you are doing it wrong. Fences are different story and they don't require system calls or context switches. Also if the claim about context switches is true, it proves my point. A user level implementation will save lots of context switches.

          ** What? Calling kernel code from userspace is much slower than a function call (that is the user space equivalent). End of story.

          Originally posted by mdedetrich View Post

          Baring hardware/silicon for acceleration, whenever you go from userspace to kernel and vice versa it requires a context switch which hits performance. For some things its not a massive deal, but for something like graphics cards its a massive deal.

          Note that in Windows XP the graphics driver used to sit in userspace and they moved it out because of performance reasons.
          I agree with the first part. By implementing common operations in user-space and for those operations completely bypassing the kernel does not increase context switching, it reduces it 0. That is what I am talking about.
          I don't think that in windows xp they implemented something like this. Most likely they implemented something microkernelish, which is worse in performance because you have to pay more context switching than a system call. We are talking about different things.
          Last edited by marios; 22 April 2021, 09:10 AM.

          Comment


          • #35
            Originally posted by marios View Post
            1. It is wrong. Userspace (ring3 at x86) is as fast as the cpu allows. And it does have direct hardware access if a driver implements an appropriate mmap.
            2. It is wrong. Kernel space (ring0 at x86) is as fast as the cpu allows. It has the same frequency and IPC as userspace, so it is as fast as userspace. It just has more privileges.
            3. It is correct. But it is slower than avoiding it, so it is suboptimal. The fact that it is the way hardware is designed to work is the reason I said something about HW/SW co-design. Changes in the hardware are definitely needed.
            4. I don't get it, maybe my English is too bad...
            5. The claim about the freeBSD way being better and fast is disputed.
            6. If you need so much sync among processes, you are doing it wrong. Fences are different story and they don't require system calls or context switches. Also if the claim about context switches is true, it proves my point. A user level implementation will save lots of context switches.

            ** What? Calling kernel code from userspace is much slower than a function call (that is the user space equivalent). End of story
            Ok, i admit the word fast is ambiguous here but in the context of execution rings and software levels who will assume i'm talking about processing speed here? and unless hardware had a drastically changed in the last 10 years mmap cannot map special purpose registers and a bunch of other special fixed functionality outside very special PICs(8051 family had that ????, didn't some Cortex M have some memory mappable registers???) that i'm not sure that exists today(i may be wrong here, has been years without using protoboards and pics myself, if someone with more modern hardware background wanna fix me you are welcome) specially from ring 3.

            Hmm, ok i think you are seeing this from a high level software PoV and i'm going from the lower level hardware PoV, hence i don't think there is middle ground here so lets leave it at that and wait if someone wanna do some test implementation and see how that goes(i'm damn sure is not gonna work but i can be wrong or maybe be missing some important information GPU hardware side)

            Comment


            • #36
              I used my decoder wheel and lookup tables and figured out that a firepro w4100 means GFX6, does that mean I will still have a working driver?

              Comment


              • #37
                Originally posted by onlyLinuxLuvUBack View Post
                I used my decoder wheel and lookup tables and figured out that a firepro w4100 means GFX6, does that mean I will still have a working driver?
                Yes... there are no plans to break current drivers, just to use this new approach for new hardware and possibly update some existing drivers.
                Test signature

                Comment


                • #38
                  Which generations of driver will this affect?

                  Comment


                  • #39
                    Originally posted by aujlaakaran0 View Post
                    Which generations of driver will this affect?
                    Only amdgpu, and probably only for the newer hardware supported by amdgpu.

                    It might make sense to extend the changes back to earlier hardware at some point, but I don't think older hardware would benefit to the same extent if at all.
                    Test signature

                    Comment

                    Working...
                    X