Announcement

Collapse
No announcement yet.

Patch Proposed For Adding x86_64 Feature Levels To The Kernel - But It's Likely D.O.A.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21


    It would seem Ice Lake, Tiger Lake and Rocket Lake all support the same AVX-512 instructions (with TGL adding the half-baked VP2INTERSECT instruction)

    And then we have Zen 4, Sapphire Rapids, Zen 5​ with added BF16 support (ZEN5 also adds an "improved" VP2INTERSECT and Saphire Rapids FP16).

    So if one ignores BF16, FP16 and VP2INTERSECT, all of these could be put under the umbrella of "AVX-512+" feature set. Which is likely better than the overly generic x86_64_v4 instruction subset.

    Any further back (Knights Landing (2016) to Cooper Lake (2020)), and its a mess of optional instructions for AVX-512 so one could only specify AVX-512F / AVX-512CD as generic support.

    Comment


    • #22
      Originally posted by ms178 View Post

      In your example, the Intel chips would still qualify for x86-64-v3, but the user needs to be told some basics in the description to which CPU architecture this maps and obviously should know what their CPU supports. I would suggest to add "march=native" for enabling all supported instructions of the local machine. That would make use of instructions that haven't made the cut for the feature levels but are supported on the local CPU. Of course a warning should be included in the description that such a Kernel wouldn't be compatible with a wide variety of hardware and should be used for local usage or known-compatible systems only.
      Um no. What if they are on ARM or RISC-V or are on an AMD Machine targeting Arrow Lake (which also does not implement AVX-512) or what if they are on an Arrow Lake machine, which supports AVX-10, but not AVX-512. Don’t forget about 3D Now! going away and…I could go on.

      Comment


      • #23
        Why not have an option for building a "local kernel" that uses -march=native and leave whatever compatibility mode as the default? For beginners, it stops the risk of building a kernel that won't run and assumes any performance benefit that could come from the additional instructions.
        Presumably anyone building a kernel to distribute, who also wants the additional instructions, can do the extra steps to set environment variables.

        Comment


        • #24
          Originally posted by betam4x View Post

          Um no. What if they are on ARM or RISC-V or are on an AMD Machine targeting Arrow Lake (which also does not implement AVX-512) or what if they are on an Arrow Lake machine, which supports AVX-10, but not AVX-512. Don’t forget about 3D Now! going away and…I could go on.
          x86-64-v3 doesn't include AVX-512, and has been fully supported on chips starting in 2013. Not *every* chip in 2013, but it's been the better part of the decade since AVX-512 has been supported for the vast majority of AMD and Intel's processor lineups.


          On a separate note:
          the benefit of x86-64-v2 and v3, and to a lesser extent right now, v4, is that they are a fairly conservative superset of x86-64, with very wide support.

          Despite being fairly conservative, when it comes to the extent of support, they do offer a pretty incredible number of additional features.

          x86-64-v2 (2012) adds: SSE3, SSSE3, SSE4.1, SSE4.2, LAHF/SAHF 64-bit
          x86-64-v3 (2013) adds: AVX, AVX2, Fused multiply-add, Bit manipulation instruction sets 1 and 2, Move with Byte Swap, ABM, and INVPCID

          and of course, v4 adds all of the much newer AVX-512 goodies.

          The benchmarks out there for these alternative feature sets are extremely sparse, and restricted to very narrow tests. The one exception is the articles here on phoronix . I spent some time looking over the many benchmark articles posted on the site recently, and all of the instances of LTO/x86-64-v2/v3 benchmarks have blown the doors off of the competition on nearly every single test thrown at them.

          These extremely dramatic results compare LTO/x86-64-v3 Clear Linux and partial-LTO/x86-64-v2 CentOS Stream kernel vs a collection of generic kernels:
          Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

          Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


          However, since clear linux has its own set of performance patches, it's hard to sort out exactly how much can be chalked up to the kernels compiled with this additional tuning.

          I would love to see how cachyos's variety of kernels stacks up to the competition, as it offers a mostly unpatched kernel with lto, and x86-64-v3/v4 architecture as linux-cachyos-sched-ext-lto (the sched-ext schedulers have to be turned on explicitly, so it's fairly close to a standard kernel). And it, of course, also offers a variety of other kernels, in both lto and non-lto versions, including a realtime kernel flavor (linux-cachyos-rt-bore and linux-cachyos-bore-lto), a hardened kernel (linux-cachyos-hardened/linux-cachyos-hardened-lto), an eevdf scheduler kernel (linux-cachyos-eevdf/linux-cachyos-eevdf-lto), a server targeted kernel (linux-cachyos-server/linux-cachyos-server-lto), the standard cachy+bore+sched_ext cachy kernel (linux-cachyos/linux-cachyos-lto), and a cachy+bore and cachy+sched_ext kernel.

          The distro's kernel Kernel Manager application also offers a nice and simple interface to customize each.

          From my reading, cachy hasn't seen any benchmarking here since 2021. But it would be a great platform to use to actually compare the performance topics at hand here. (I actually finally got a phoronix subscription recently because of questions around exactly this, so I'm glad this article and thread came along to give me a platform go off xD )

          Comment


          • #25
            I've been building my own custom kernels with architecture-specific kernel optimisations for ages now using the various patches floating around to do so. I agree it's not practical for upstream inclusion. You can't have a situation where the kernel can't boot on generic hardware (unless that's exactly what the user wants and they've gone out to do so like I have). This will never be suitable for distro inclusion.

            Comment


            • #26
              I admit that the help text should be more complex.

              Gen1 X86-64
              Bootable on all 64bit x86 Processors

              Gen2 X86-64
              Bootable on all 64bit x86 Processors with sse4.1 4.2

              Gen3 X86-64
              Bootable on all 64bit x86 Processors with sse4.1 4.2 and avx

              And then a warning if you choose anything else than Gen1 your system may not be bootable in a virtual enviroment or you cant migrate to other cpu´s in the case of fatal hardware failure.

              The pepole maintaining systems are not nessecary idiots they probably know what they do if they compile a new kernel to squesse out a few cpu cycles from more cpu registers.

              And to the generic guys no distribution will ever compile a mainline kernel that doesnt work on your pentium4, it will work they wont check that box even if its upstream. Or they would do the windows thing buhaa.
              Last edited by erniv2; 18 September 2024, 01:02 AM.

              Comment


              • #27
                Originally posted by erniv2 View Post
                The pepole maintaining systems are not nessecary idiots they probably know what they do if they compile a new kernel to squesse out a few cpu cycles from more cpu registers.

                And to the generic guys no distribution will ever compile a mainline kernel that doesnt work on your pentium4, it will work they wont check that box even if its upstream.
                It should probably also auto-detect the CPU architecture of the host build-system and throw an informational warning (not an error) if this is different because people often build custom kernels on the same machine that will be running the kernel in question.

                If distros aren't going to bother with this then why upstream it though? There are already maintained patch sets that I've been using for years. They're probably not going to go away any time soon and just need minor adjustments sometimes.

                Comment


                • #28
                  Rather than wondering what the perf is like - does Clear Linux use any X86 feature level optimisations? if there's perf to be had, no doubt it's been had already.

                  Comment


                  • #29
                    Originally posted by boxie View Post
                    Rather than wondering what the perf is like - does Clear Linux use any X86 feature level optimisations? if there's perf to be had, no doubt it's been had already.
                    They have a number of patches they apply but it's still fundamentally a generic kernel because despite being focused on performance, like most distributions they still have to support booting on older hardware:
                    Contribute to clearlinux-pkgs/linux development by creating an account on GitHub.

                    Comment


                    • #30
                      Originally posted by the-burrito-triangle View Post
                      At any rate, I think this feature shouldn't be as generic as x86_64_v2, v3, v4, etc., and instead, specifically state the instruction sets being optimized for in the compiler or at least state a subset of CPU generations that have a _larger_ overlap of modern instructions like TGL/RKL and ZEN4/5. Otherwise it becomes too generic to make it worthwhile. Though, we don't want to be _too_ specific either (e.g., say ZEN4/5 only). There is a delicate middle ground that needs to be found that adds as much optimization for the largest subset of CPUs without trying to support everything under the sun.
                      Um, that's exactly what the microarchitectural levels are. What exactly goes into each is defined in the psABI at https://gitlab.com/x86-psABIs/x86-64-ABI

                      Comment

                      Working...
                      X