Announcement

Collapse
No announcement yet.

Punting GPU Drivers From The Initramfs Due To Ever Increasing Firmware Bloat

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by Old Grouch View Post
    Not only does the ESP not need to be the first partition, is also possible to have more than one partition marked as ESP on a block device. It's a system partition, not a boot partition.
    The key word is "typically" Not all UEFI firmware implementations support ESP not being first partition(not all UEFI implementation are to specification). Not all UEFI implementations support drive having more than one partition marked ESP.

    Also the UEFI specification

    Section 11.2.1.3:
    For removable media devices there must be only one EFI system partition, and that partition must contain an EFI defined directory in the root directory
    This line is a problem child.

    UEFI has a stack of implementations some more picky than others. The way you read that line specification is horrible. Lets say you motherboard Sata support support hot-swap so are all you motherboard connected Sata drives classed as removable? Now lets say you M2 PCIe slot also support hotswap is this also now a removable? That can be yes and can be no to both and this change when ever you update your motherboard firmware.

    Yes by UEFI specification for a non removable drive you are allowed multi ESP partitions on the drive. Key words is "non removable". .You do a motherboard firmware update a drive that was believed to be "non removable" might now be believing it "removable" so following the removable rules so you system breaks if you have more than 1 ESP partition on a drive. Safe play is 1 ESP partition per drive.

    UEFI specification really does not define clearly what is non removable and removable is so it up to your UEFI firmware makers and motherboard vendor configuration for what is and is not removable.. This again why the safe play is 1 ESP partition per drive no guessing what choices have been made.

    Yes all the way up until this point everything that happening is inside UEFI specification no matter how hair pulling it is.

    ESP partition needing to be the first partition your UEFI implementation is non conforming to UEFI specification but these do horrible exist.

    Comment


    • #52
      Originally posted by oiaohm View Post

      This all in fact make sense why it was at first done this way when you look at early UEFI. The early UEFI you had a 1 in 5 chance in first generation of UEFI motherboards that when you took over the storage drive that the EFI system partition is on that UEFI provided services would stop. Yes this includes the input and display output provided by UEFI. This kind of screwed up behavior in new motherboards is less than 1 in 100000 chance of being the case and with a new motherboard if the motherboard has this problem . Most likely the new motherboard has some other major problem you don't want to be dealing with if new motherboard has the problem of early disappearance of UEFI provided display and input it most likely has a problem like 1 in 5 boots not booting because is magically failing to find the ram(so defective junk).

      The older UEFI motherboards(first to 3 generation/first 3 years of UEFI existence) makes sense to take over everything as soon as possible because before you have even mounted the root partition the UEFI services providing display/input and so on have gone away with way too high of a percentage of users effected. Newer UEFI you can depend on it provided services for longer. Those who attempt to keep a system running just using simple DRM in the linux kernel using the UEFI provided frame-buffer interface still find out that many modern motherboards this works for about 15 to 20 mins before going splat so not a infinity amount of time you can depend on the UEFI provided services. The old bios vesa graphics interfaces were way more stable.

      Weasel basically this is a rock and hard place. You want to support every possible UEFI motherboard in existence you have to-do it the way the Linux kernel has been doing it with the cost of expanding image to laod. The trouble making motherboards have dropped under 0.01% of the motherboard in use.. There is going to be some effected user swapping to simpledrm sysfb-simplefb for early boot due to the fact Linux users have habits of operating 10+ year old hardware at times but the benefit the the majority most likely should win out at the point on reduced boot time with custom initramfs and sometimes custom kernels for those who are still using those old systems.
      Yeah, I totally get it can be broken on old af motherboards. What I mean is to fallback to it in case there's no drivers available. Better than nothing to try at least if UEFI can provide it.

      Comment


      • #53
        Originally posted by chithanh View Post
        This is not correct. UEFI boot protocol tells the kernel this information, and besides that you can look it up after boot with efibootmgr which tells you the currently active boot entry and its EFI system partition.
        Oh really? Show me a kernel with zero device drivers for your storage device accessing this "information" and reading a file from said disk. UEFI did read it after all, so why shouldn't Linux be able to, huh?

        When I said the kernel doesn't know I meant without looking it up via its device drivers. Of course it's passed the UUID for instance. But the thing is it needs device drivers and filesystem drivers and a lot of other stuff to access it. Basically, what we're trying to avoid. Things that are put in an initramfs.

        UEFI can access your USB keyboard/mouse totally fine in the BIOS or POST menu, systemd-boot can (using UEFI). So, why the fuck Linux can't? Linux cannot without having USB-HID drivers, either integrated into the kernel or loaded from initramfs. This is the issue (same with graphics). And yes, I have experience with this. I fucking tried it several times. And I build my own custom initramfs with custom scripts and experiment a lot.

        Comment


        • #54
          Originally posted by Weasel View Post
          Yeah, I totally get it can be broken on old af motherboards. What I mean is to fallback to it in case there's no drivers available. Better than nothing to try at least if UEFI can provide it.
          Kind of do need to draw line in sand. Some of those old motherboards hooking up the UEFI graphical end locking you out from cleanly resetting the GPU to start the GPU driver as well.

          Remember people running windows for the first 3 years of UEFI motherboards was around were forcing different machines back into Legacy bios mode because not even Microsoft Windows could start and be stable using the UEFI bits. So these early UEFI boards are just quirky nightmares. Worst cases is partition damage from attempting to use some of these early UEFI provided services on these early generations of UEFI.

          So your claim better than nothing is not in fact true on some of the first 3 years of UEFI motherboards. Those first 3 years of motherboard there are lot in there you want nothing so you have no data corruption.

          What the Linux kernel has been doing is the right thing for the early UEFI motherboards that includes not supporting the UEFI provided input devices not to have trouble. Now changing this behavior there is a need for a blacklist. but the problem is some of these motherboards are insanely hard to ID. Fun even today there are some people making motherboards that leave the motherboard IDs blank. If we are willing to tolerate throwing a few new motherboards with missing IDs out with the bad old motherboards this can work this is the drawing the line in the sand bit.

          Yes Microsoft min Intel and AMD cpu support on newer OSs also removes Windows from being able to boot on the early bad UEFI motherboards this could be another way to draw line in sand.

          The big problem here is the early bad UEFI implementation interacting with the UEFI provide services can cause horrible things that is what makes trying what UEFI without some form of filtering to remove those bad UEFI implementations not exactly the wisest move.

          Weasel; i was understanding exactly what you meant about fall back to the UEFI if you have no other drivers this does not work on your early UEFI where those services from UEFI are a pandoras box of trouble. Trying on UEFI services after the 3 years of UEFI is a fairly safe thing. Boards with firmware designed between 2006 to 2010 and was sold up until 2014 as new this is the problem space where the trouble is. So just over 10 year old parts does the trouble with UEFI appears. Of course there are people still running pre UEFI stuff running Linux and this is what make this tricky.

          Please note tricky not impossible to deal with. Tricky makes this not straight forwards and needing blacklisting to prevent using UEFI provided bits on the trouble making motherboards.
          Last edited by oiaohm; 02 May 2024, 12:22 PM.

          Comment


          • #55
            Rather than linking to an Intel EFI specification (1.10) from December 2002, it might be an idea to link to the current UEFI specification (currently UEFI 2.9 from March 2021) :
            https://uefi.org/specifications.

            I agree the specification is badly worded. It would be helpful to have an explainer somewhere detailing why removable media specifically is restricted to a single ESP.

            Edit to add: I forgot to provide this interesting link:

            A lot of the time, we talk about creating a partition to serve as the EFI System Partition. This partition is mandated by the UEFI specification for several tasks. Adam covered what’s going on at a relatively high level on his recent blog post, and you should read the whole thing: (from Adam Williamson https://www.happyassassin.net/2014/01/25/uefi-boot-how-does-that-actually-work-then/ ) An ‘EFI system partition’ is really just any partition formatted with one of the UEFI spec-defined variants of FAT and given a specific GPT partition type to help the firmware find it. And the purpose of this is just as described above: allow everyone to rely on the fact that the firmware layer will definitely be able to read data from a pretty ‘normal’ disk partition.

            Last edited by Old Grouch; 03 May 2024, 07:35 AM. Reason: Add useful link

            Comment


            • #56
              Originally posted by Weasel View Post
              What if you boot from USB? You need USB drivers. You need USB drivers for HID input in case of emergency breakage, and so on.

              The real issue is why doesn't Linux fucking use the UEFI APIs that provide all this and more, at least as a fallback? Why it insists on using its own drivers for everything?
              ssokolow's reply several posts below yours is spot on. The hard truth is that the user experience is ultimately better that way

              Legacy BIOS and UEFI implementations are buggy on most platforms, without even talking about security vulnerabilities. Relying on the BIOS to always behave correctly for device initialization on all platforms, when BIOS implementations can hardly ever get basic hardware information right, or often contain code/data which does not abide by the specs in ACPI tables such as DSDT, would make all kernels liable to obscure and hard to debug issues which they can't fix, in software which is usually closed-source, except for a handful of platforms.

              Known bugs in some BIOS implementations include, but are definitely not limited to:
              * frequent: a wide range of bugs in ACPI tables, especially the DSDT. Workaround: try to make do, in order to make suspend and resume, or special keyboard buttons, work;
              ​* frequent: DMI data advertising more memory slots than there are physically on the platform, wrong memory speed and/or configured speed, missing information for memory (voltages, etc.), placeholder data for e.g. serial numbers ("To Be Filled By O.E.M"), and other glaring errors. Workaround: just don't use that data for serious purposes; poke the SMBus controller on the I2C bus and parse SPD data yourself, which is much easier said than done, due to whatever additional fiddling Xeons from SNB onwards require, undocumented bus MUXers, or peculiarities of some SMBus controllers (e.g. some variants of PIIX4 require extra delays);
              * infrequent: buggy UEFI SMP implementation on select platforms, which limits the code to UP. Workaround: for things to work reliably even on those buggy BIOS, just use your own ACPI data searching code, parse the ACPI MADT or its ancestors, and use the APIC / x2APIC controller directly;
              * infrequent: bad memory maps, which forget to declare as reserved some ranges which hardware can asynchronously rewrite on its own, leading to spurious errors in memory testers;
              * infrequent: questionable UEFI video mode data, e.g. for rotated screens. Workaround: do what you can...

              Oh, and UEFI boot services can gobble hundreds of MBs of memory for their own purposes, on some platforms. That's why memtest86+ exits them ASAP when they're no longer needed - and then, since it can't rely on the UEFI stack anymore, it needs its own drivers for everything (including USB controllers), just like Linux does.

              What about BIOS fixes ? Easy: the provider receives good bug reports, quickly finds the bug and fixes the offending BIOS implementation, then platform manufacturers dutifully propagate the fixes to their code base (supported over the long term, of course), fail to screw up something in the process, adequately test the new version, push a new release of the BIOS, and users even actually install the new release of the BIOS quickly, too. Well, in an ideal world, maybe, but in our real world, it's not quite so simple ^^
              Last edited by debrouxl; 02 May 2024, 07:29 PM.

              Comment


              • #57
                Originally posted by Old Grouch View Post

                Rather than linking to an Intel EFI specification (1.10) from December 2002, it might be an idea to link to the current UEFI specification (currently UEFI 2.9 from March 2021) :
                https://uefi.org/specifications.

                I agree the specification is badly worded. It would be helpful to have an explainer somewhere detailing why removable media specifically is restricted to a single ESP.
                It was that I could remember off the top of my head where it was exactly in the old UEFI standard.

                2.10 is the current one.
                13.3.1.3 Directory Structure
                For removable media devices there must be only one UEFI-compliant system partition, and that partition must contain an UEFI-defined directory in the root directory​
                13.3.3 Number and Location of System Partitions​
                Yes this one says you are allowed many ESP on a drive.
                It is outside of the scope of this specification to attempt to coordinate the specification of size and location of an ESP that can be shared by multiple OS or
                Diagnostics installations, or to manage potential namespace collisions in directory naming in a single (central) ESP.​
                So more than 1 OS sharing a single ESP run at your own risk.

                Also this Number and Location of System Partitions is not defined in the early 1.10 EFI specifications I cannot remember what version of EFI where it was added. Yes multi ESP was added to address OS attempting to share a single ESP and stomping on each other but has not existed in all versions of UEFI.

                Yes specification is clear for removable drives not clear for non removable drives. There was a section added in 2009 then deleted in 2009 from the EFI specification that attempt to clean this stack of what the heck up with non removable media. I basically treat using multi ESP partitions on a single drive as playing with undefined behavior it can work but it can fail badly even that the specifications in one place say it can work.

                Old Grouch there still machines being released with firmware that from 2006 based off 1.10 2002 document.

                When dealing with motherboard firmware quirks you end up having to compare all the UEFI versions. Yes the firmware you get can have sections made for the current 2.10 specification and other sections made for 1.10. Basically welcome to quirks are us. Even in the old BIOS days before UEFI we had firmware on motherboards being quirks are us this has not changed. This is why UEFI is not a magic bullet to a lot of problems.

                Comment


                • #58
                  Originally posted by oiaohm View Post

                  It was that I could remember off the top of my head where it was exactly in the old UEFI standard.

                  2.10 is the current one.
                  13.3.1.3 Directory Structure


                  13.3.3 Number and Location of System Partitions​
                  Yes this one says you are allowed many ESP on a drive.

                  So more than 1 OS sharing a single ESP run at your own risk.

                  Also this Number and Location of System Partitions is not defined in the early 1.10 EFI specifications I cannot remember what version of EFI where it was added. Yes multi ESP was added to address OS attempting to share a single ESP and stomping on each other but has not existed in all versions of UEFI.

                  Yes specification is clear for removable drives not clear for non removable drives. There was a section added in 2009 then deleted in 2009 from the EFI specification that attempt to clean this stack of what the heck up with non removable media. I basically treat using multi ESP partitions on a single drive as playing with undefined behavior it can work but it can fail badly even that the specifications in one place say it can work.

                  Old Grouch there still machines being released with firmware that from 2006 based off 1.10 2002 document.

                  When dealing with motherboard firmware quirks you end up having to compare all the UEFI versions. Yes the firmware you get can have sections made for the current 2.10 specification and other sections made for 1.10. Basically welcome to quirks are us. Even in the old BIOS days before UEFI we had firmware on motherboards being quirks are us this has not changed. This is why UEFI is not a magic bullet to a lot of problems.
                  I think we agree that the specifications are not well-written in this regard.

                  As for multiple ESP partitions, the behaviour is only (possibly) undefined on default boot. Once your UEFI boot variables are defined, the path in the chosen boot is used, and will work if it is a valid path. So it is only really a problem on default boot, which doesn't happen that often. Typically you boot from removable media (where, as you point out, the standard requires but a single ESP) and install your system on a different, usually (but not always) non-removable block device. The install process will define the necessary boot variable, including the path. The device path will include the GPT partition signature UUID (Section 10.3.5.1 Hard Drive), so there is no ambiguity, unless you have more than one partition with the same partition signature on the same block device on your system. Partition discovery (Section 13.3.2 Partition Discovery) should be run on all of the attached logical block devices.

                  The end result is that, so long as your UEFI boot variables are not corrupted, you can have as many ESPs as you like. However, if you want the (unattended) default boot (when the UEFI boot variables are invalid) to be deterministic, you are restricted. In the default boot case, most UEFI firmware allows you to choose to boot from a chosen removable device, which, by the standard, has an unambiguous device path for the boot efi executable.

                  I take your point about old versions of firmware being lacking in some functionality. How and whether to support old, non-standards compliant, systems is always an interesting debate.

                  If there is more than one removable device attached to the system, both with a single valid ESP on it, how does the default boot process deterministically choose which one to use?

                  Comment


                  • #59
                    Originally posted by Old Grouch View Post
                    If there is more than one removable device attached to the system, both with a single valid ESP on it, how does the default boot process deterministically choose which one to use?
                    Normally what ever device the firmware decided to process first wins but this is not written in the specifications for removable but yes part in in:.

                    Section 13.3.2 Partition Discovery
                    The following is the order in which a block device must be scanned to determine if it contains partitions. When a check for a valid partitioning scheme succeeds, the search terminates.
                    So no it does not run on all logical devices by UEFI standard that only happens if no drive has partitions by standard. First logical device to turn up with valid partition table the UEFI firmware technically can bail out at this point. Yes fun of having you data drive in sata 0 and you OS drive in sata1 an being unable to work out why the system will not boot. Lot of motherboard disregard this bit of UEFI if a ESP has not been found yet but not all.

                    Something fun when you get a firmware update for your motherboard it might be a older version of the UEFI standard than what the one you had was that is older.

                    Old Grouch; UEFI standard is really annoying because it so simple to read it miss a line and think it promises something it does not.

                    Comment


                    • #60
                      Originally posted by oiaohm View Post

                      Normally what ever device the firmware decided to process first wins but this is not written in the specifications for removable but yes part in in:.

                      Section 13.3.2 Partition Discovery

                      The following is the order in which a block device must be scanned to determine if it contains partitions. Section 13.3.2 Partition Discovery

                      The following is the order in which a block device must be scanned to determine if it contains partitions. When a check for a valid partitioning scheme succeeds, the search terminates.


                      So no it does not run on all logical devices by UEFI standard that only happens if no drive has partitions by standard. First logical device to turn up with valid partition table the UEFI firmware technically can bail out at this point. Yes fun of having you data drive in sata 0 and you OS drive in sata1 an being unable to work out why the system will not boot. Lot of motherboard disregard this bit of UEFI if a ESP has not been found yet but not all.

                      Something fun when you get a firmware update for your motherboard it might be a older version of the UEFI standard than what the one you had was that is older.

                      Old Grouch; UEFI standard is really annoying because it so simple to read it miss a line and think it promises something it does not.​

                      So no it does not run on all logical devices by UEFI standard that only happens if no drive has partitions by standard. First logical device to turn up with valid partition table the UEFI firmware technically can bail out at this point. Yes fun of having you data drive in sata 0 and you OS drive in sata1 an being unable to work out why the system will not boot. Lot of motherboard disregard this bit of UEFI if a ESP has not been found yet but not all.

                      Something fun when you get a firmware update for your motherboard it might be a older version of the UEFI standard than what the one you had was that is older.

                      Old Grouch; UEFI standard is really annoying because it so simple to read it miss a line and think it promises something it does not.
                      Ironically, I think you are misreading that section. My reading is that "When a check for a valid partitioning scheme succeeds, the search terminates. [on that device and the search continues on the next (enumerated) device.] (See Section 3.1.2 Load Option Processing to see that the boot manager has this capability.)

                      ...When the boot manager attempts to boot a short-form File Path Media Device Path, it will enumerate all removable media devices, followed by all fixed media devices, creating boot options for each device. The boot option FilePathList[0] is constructed by appending short-form File Path Media Device Path to the device path of a media. The order within each group is undefined. These new boot options must not be saved to non volatile storage, and may not be added to BootOrder. The boot manager will then attempt to boot from each boot option.​
                      The point being that the (default) boot manager, which operates when the boot variables are missing, corrupt, or ignored/by-passed, generates all possible boot options it can find first before then attempting each in turn. It doesn't simply stop at the first one.

                      If you are not running the default boot manager and using the boot variables, the path to the bootx64.efi (or equivalent) stored in the variable has a lot of freedom.

                      But again: the standard could do with some pseudocode to explain what the boot process should be doing. The current approach is subject to (mis)interpretation, which is a problem with the standard, not a problem that people 'aren't reading it right'.

                      And it doesn't help that so many implementations are not standards compliant, as you point out.

                      Comment

                      Working...
                      X