Announcement

Collapse
No announcement yet.

Idea Raised For Reducing The Size Of The AMDGPU Driver With Its Massive Header Files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    They should move documentation in headers to document files. Reference them as in the header, but leave the documentation out.

    Also, split amdgpu-gcn and amdgpu-rdna into their own driver branches.

    We don't need yet another bloated repo going up.

    Comment


    • #32
      Originally posted by NeoMorpheus View Post
      Are we absolutely sure that the AMD code is the only guilty party here?

      Maybe Plymouth needs to be adjusted to wait a bit longer?
      I wondered about that when reading the initial article NeoMorpheus. It's only a few seconds and surely changing the Plymouth timeout is the most simple and direct way of fixing the immediate "problem."

      In any case I have no objection to restructuring and optimizing the code size, but as many others on this thread have so eloquently expressed it's unlikely to be an easy task, and given the effort required care should be taken so that the changes address the issues of today, and the future, to as great a degree possible. And that will require time and input from many architects and engineers and users.

      Comment


      • #33
        Originally posted by rhbvkleef View Post
        I have to say, I'm not a big fan of moving the AMDGPU headers into a separate repository. Pruning the the headers and eliminating the unused bits seems to me as a much more sane solution to this problem.

        Of course an even better solution would be to not store generated code in the kernel tree at all, and store the program that generates that source in stead, though that might prove difficult in the case of AMDGPU. I am not sure if that program/those programs are publicly available.
        From my reading of the article, even though the headers aren't used by the driver, some of them are used by other projects, like Mesa. It sounds like if you delete them totally, you break Mesa and maybe some other stuff?

        Comment


        • #34
          Why not fix plymouth instead of screwing with critical infrastructure? I personally have never seen plymouth work on any machine that had it, it breaks because of -everything-, especially if you use nvidia.

          (bites tongue to avoid epic rant about how nonsensical the decision making process is in the linux community now)

          Comment


          • #35
            Originally posted by Blisterexe View Post

            Plymouth's fancy splash screens contribute considerably to linux feeling polished or good to the average person, as silly as that might seem
            I’m quite far from an average person and I love Plymouth on my machines quite much, so…

            Comment


            • #36
              There are at least 2 separate classes issues they need to address to fix the issue of the driver being big and slow to load.
              1. Janitorial work so that it's easier to work on it (this is an example)
              2. Performance fixes, so that the driver can load faster

              You need to do both types of cleanup. That's just how maintenance programming works.

              Comment


              • #37
                Like many others here, I wondered what the code size has to do with the boot problem, and it seems that it's not directly related. From the Plymouth issue discussion:

                I believe the most time is actually spend on dynamically linking the ko after it has been decompressed / loaded into mem. That approx. 16MB of code is likely calling kernel functions like dev_dbg() / dev_info(), etc. in a lot of places and all those call sites need to be dynamically patched with the function address of those symbols. At least that is what I think is causing the biggest slowdown.
                Last edited by ET3D; 17 September 2024, 10:05 AM.

                Comment


                • #38
                  Originally posted by Vorpal View Post
                  Sure, you can use GCC/Clang attributes to change compilation flags (including optimisation level) per function. This would be quite a manual process and a massive amount of work. What would be more viable is using PGO (Profile Guided Optimisation) to record representative workloads, and use that to determine what parts are hot or not. I don't know off the top of my head exactly how that ties into optimisations levels (if at all) or if it is only about things like improving inlining/outlining. But that seems like a feasible thing to add in compilers if it isn't already supported.

                  The next question would then be how to distribute pre-made PGO profiles for a given source release, and that may be where this idea breaks down. Perhaps you could pull data back from PGO and use that to automatically annotate functions as hot or cold in the source though?
                  Yeah, I had in mind a process (and supporting tools) that's mostly automated.

                  Originally posted by Vorpal View Post
                  ​Still I doubt this would make such a big difference that it would be relevant here.
                  Performance-wise, it wouldn't be a game-changer. IIRC, we've seen code layout optimizations worth <= ~10%. However, I'll bet you could reduce the compiled size by a lot more than that.

                  Comment


                  • #39
                    Originally posted by ahrs View Post

                    The average linux boot on a system with good hardware (reasonable CPU/GPU and fast PCIe Gen4/5 NVMe drive) probably spends more time in Plymouth than actually booting. Splash screens were great when people had slow hard drives so needed something pretty to look at while the rest of the system caught up. Not so much anymore, a quiet black screen that boots instantly to GDM or SDDM is enough.
                    A lot of people have older pc's, so i think the extra quarter second added to the boots of people with fast pc's is worth it.

                    Comment


                    • #40
                      People having issues can force load the amdgpu driver early during the boot process. Both dracut and mkinitcpio support this. Plymouth could also be configured with a bigger timeout value.

                      IMO, even thinking about splitting the amdgpu kernel driver is insane as a reaction to this purely cosmetic problem. Further, the issue seems to occur on Fedora, but I haven't seen a report from OpenSUSE or Ubuntu yet. They all use plymouth though -> likely distro/packaging specific issue.

                      And yes, the kernel source tree is approx. 1.6GB extracted now, with around 480mb being amdgpu register documentation headers (for GCN1-5, RDNA1, 2, 3, 3.5, 4, CDNA1-3 and all APU derivatives of them!) that collapse in size in the compiled driver. So what?! - it still easily fits on a computer from the last 20 or so years and the cheapest of SD cards and USB sticks…

                      The compiled amdgpu module is 20mb uncompressed and 4.5mb compressed. The proprietary nvidia driver .ko is 50mb+ compressed(!) - why don't people report plymouth issues en-masse there? Because the issue isn't code size or dynamic module linking!

                      Think what the driver getting split would mean: libdrm, mesa, aswell as amdgpu-pro would need to support those extra interfaces. Boom - code duplication, separate bug trackers and split development resources in several downstream projects.

                      In case of the headers being pushed to an extra repo: this would make it difficult to build the kernel on systems with irregular internet access and generally add an annoying extra step for packagers and people wanting to build it.

                      Just leave things as they are and let Fedora sort out things using far less invasive means. The debate about this issue is utterly ridiculous and the problem has been inflated way out of proportion.

                      Comment

                      Working...
                      X