Announcement

Collapse
No announcement yet.

David Airlie Shares His Thoughts On Current Challenges With Linux GPU Compute Stacks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • David Airlie Shares His Thoughts On Current Challenges With Linux GPU Compute Stacks

    Phoronix: David Airlie Shares His Thoughts On Current Challenges With Linux GPU Compute Stacks

    Sriram Ramkrishna at Intel, who serves as the community manager and developer relations for oneAPI, held a virtual oneAPI meetup this week with Red Hat's David Airlie. Airlie should not need any introduction for longtime Phoronix readers given his longtime contributions to the Linux kernel graphics drivers, Mesa, and related open-source graphics work at Red Hat. Airlie shared some interesting remarks around the current Linux GPU compute stacks from the different vendors and associated challenges...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Thanks for posting, Michael.

    Airlie does acknowledge the success of NVIDIA's CUDA thanks to it working across all NVIDIA GPUs regardless of consumer or workstation/data-center focus.
    This. AMD appears to be changing its tune, but I'm withhold praise until we see some real follow-through.

    I'm still buying Intel, for my next dGPU. I appreciate all the work they're doing to further open standards, like OpenCL and SYCL. I can confirm that their SYCL sample code for Ponte Vecchio also works on the Xe iGPUs (as long as the code doesn't require fp64 support - the Xe iGPUs have none!).
    Last edited by coder; 02 September 2023, 05:05 PM.

    Comment


    • #3
      Airlie said nearly everything I tought about FOSS GPU and more!

      I hope LLVM and clang get massively unforked, compute and virtualization is available in consumer GPUs, ROCm situation improves by merging all GPU computing efforts into MESA with common APIs and more upstream support equivalent to other Operating Systems and better.

      HDR, hardware video encoding/decoding even supported by software such as Firefox and better color management must be improved too.

      Comment


      • #4
        I really want to see Vulkan compute succeed, but Khronos would need to actually adapt it for that. Vulkan runs on any hardware vendor, on any OS with already mature drivers.

        SYCL looks cool but based on OpenCL I don't think is going anywhere.

        Comment


        • #5
          Originally posted by Weasel
          How about praising NVIDIA for having a driver decoupled from the fucking kernel and Mesa so you can update each of them separately without risking something else breaking in the process or updating a shit ton of dependencies?
          the famously decoupled nvidia driver: https://duckduckgo.com/?q=nvidia+lin...+update+broken

          Comment


          • #6
            Originally posted by Weasel
            How about praising NVIDIA for having a driver decoupled from the fucking kernel and Mesa so you can update each of them separately without risking something else breaking in the process or updating a shit ton of dependencies?

            This is why I will continue to buy only NVIDIA GPUs on Linux until it changes. Yes, closed source sucks, but convenience and peace of mind is a lot more important than having it open source.
            You don't need closed source to be decoupled from the kernel, just a kernel compatibility layer. Either way it has to be out of tree.

            Our KCL happens to be open source and delivered via DKMS in our packaged drivers. Same for Mesa - we include pre-built open source drivers in the packaged downloads.
            Last edited by bridgman; 02 September 2023, 05:47 PM.
            Test signature

            Comment


            • #7
              Originally posted by Weasel
              How about praising NVIDIA for having a driver decoupled from the fucking kernel and Mesa so you can update each of them separately without risking something else breaking in the process or updating a shit ton of dependencies?
              I don't get the dependency update concern much, you're updating anyway and the nvidia drivers updating can be rather large in size compared to any other package IIRC.

              Updating nvidia drivers doesn't always go smoothly btw. I often experienced the inability to shutdown via UI to reboot because on my distro with the active kernel it was no longer able to find an nvidia kernel module or something... had to hard power off or via terminal.

              I recall one update with a kernel switching on IBT by default or something broke booting due to incompatibility with the nvidia drivers, I think you had to be running an Intel CPU for that too. After a bit of searching online you learn the cause is the IBT change and a kernel parameter to fix it until nvidia eventually pushed out a driver update sometime later that resolved the incompatibility.

              I'm can't recall the cause but there was another breakage with a popular third-party package to enable VA-API video decoding, so Firefox and other apps could leverage it. What I didn't expect was that broke some apps in weird ways. GitKraken would crash on start apparently due to the failure when trying to play it's startup splash animation. Discord lost hardware accel which made some functionality using hardware accel unusable or very slow. Probably broke a few other things.

              In the past it caused a variety of graphical artifact issues / crashes with kwin / plasma. Still isn't able to leverage virgl / virtio-gpu well AFAIK (AMD apparently fairs much better there and is getting improved support with Intel to follow thanks to work in Mesa), nvidia has also needed some extra config/support to properly support hibernation restoring the vRAM (although they've been working to improve on that this year IIRC).

              This sort of range of issues is common in nvidia news discussions on phoronix. I know that AMD has had its fair share of issues too, although the scope tends to be not as wide. Nvidia also has its perks. I've got a laptop with an AMD 780M now, haven't switched it over to Linux yet while I waited on kernel releases + firmware updates (which AFAIK are Windows dependent and still frequent enough). Bizarrely there is no generic 780M graphics driver last I looked, at least not available for Windows. You kind of have to hunt down your specific OEM one. A BIOS update also force restarted my system while I was away, you can postpone up to 4 hours if you're there but the next notification is by default 10 minutes. I miss Linux.

              Comment


              • #8
                Originally posted by polarathene View Post

                I don't get the dependency update concern much, you're updating anyway and the nvidia drivers updating can be rather large in size compared to any other package IIRC.
                Really? How clueless can people be around here.

                If you are running a Linux distro that is not bleeding edge/rolling release and if you are using in tree drivers it means you cannot get newer versions of the driver unless someone backports it from a newer kernel (this never happens btw). This problem is even worse if you need the newer driver to support newer graphics cards that are being released.


                Originally posted by bridgman View Post

                You don't need closed source to be decoupled from the kernel, just a kernel compatibility layer. Either way it has to be out of tree.

                Our KCL happens to be open source and delivered via DKMS in our packaged drivers. Same for Mesa - we include pre-built open source drivers in the packaged downloads.
                Yeah it would be much more ideal if the Linux kernel had an actual ABI for graphics cards drivers that would be stable amongst different kernel versions otherwise everyone just reinvents the wheel

                Comment


                • #9
                  Originally posted by mdedetrich View Post
                  [...]
                  If you are running a Linux distro that is not bleeding edge/rolling release
                  [...]
                  What you really saying is that if you use Ubuntu (or derivative) you have to wait to get the latest kernel and mesa ( 'driver' itself means nothing, there are multiple drivers for the same GPU: kernel, opengl, vulkan, opencl, etc)
                  But this is frankly old school for desktop: Arch (and all derivative including Manjaro etc) is rolling, Suse tumbleweed is rolling, Fedora is 'semi-rolling', even Debian Testing can be considered rolling, etc.

                  On server, the entire argument is moot, because there you tie software and hardware (aka you buy hardware that you know/is guarantee to work with the software stack of choice)

                  On the other hand (on desktop), with your beloved NVIDIA if the support of the latest kernel is dropped for your gpu, you are completely stuck (been there and no thanks -- I am on hardware with full open-source and in kernel support, rolling release of linux, and I have no worries about getting the latest and greatest, since 2016)

                  Comment


                  • #10
                    Originally posted by mdedetrich View Post
                    Yeah it would be much more ideal if the Linux kernel had an actual ABI for graphics cards drivers that would be stable amongst different kernel versions otherwise everyone just reinvents the wheel
                    But there is a problem there.

                    Yes like it or not this is a good write up why it does not work.

                    Rust developers who wanted Rust in the Linux kernel attempted to mix LLVM with Gcc only recently to find out that is insanely bad idea. Yes so this means that you have to try to get all kernel builders to agree to use the same compiler/compatible compiler to allow binary drivers to work. Yes some of the historic nvidia binary blob glitches on Linux have come about because the compiler to build the wrapper and the compiler that built the blob did not agree on how things should work.

                    One of the biggest problem is security patching.

                    Security issues are also very important for Linux. When a
                    security issue is found, it is fixed in a very short amount of time. A
                    number of times this has caused internal kernel interfaces to be
                    reworked to prevent the security problem from occurring. When this
                    happens, all drivers that use the interfaces were also fixed at the
                    same time, ensuring that the security problem was fixed and could not
                    come back at some future time accidentally. If the internal interfaces
                    were not allowed to change, fixing this kind of security problem and
                    insuring that it could not happen again would not be possible.​
                    Yes this problem here is why Linux kernels are able to get rid of security issues faster than Windows due to not have a stable kernel space ABI so able to refactor the kernel API to remove security faults for good.

                    Yes a stable ABI how are you going to have this and not end up with security issues or graphics driver developers complaining it is too slow. I am old enough to remember the the attempt at unified drivers between Linux/BSD and Unix that Nvidia said they would never use because the performance cost was too high.

                    mdedetrich you can wish for it all you like. The reality is many parties have tried. Think about it you keep a stable ABI you end up with low performance compatibility wrappers yes this is even true for the Linux kernel syscall interface.

                    Do remember X11 had the idea that all graphics drivers would be written in userspace yes the old UMS drivers before the Linux kernel had KMS people were not happy with the UMS drivers performance and stability..

                    Yes with vkms it still technically possible to implement a graphics driver in userspace so Linux kernel does in fact have a stable ABI for writing graphics drivers just no one really wants to use except for some rare embedded GPUs it because it means you implement your graphics driver as a userspace program.

                    mdedetrich basically you just asked for something the Linux kernel already has but it suxs.

                    Comment

                    Working...
                    X