Announcement

Collapse
No announcement yet.

Patch Proposed For Adding x86_64 Feature Levels To The Kernel - But It's Likely D.O.A.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • boxie
    replied
    Rather than wondering what the perf is like - does Clear Linux use any X86 feature level optimisations? if there's perf to be had, no doubt it's been had already.

    Leave a comment:


  • ahrs
    replied
    Originally posted by erniv2 View Post
    The pepole maintaining systems are not nessecary idiots they probably know what they do if they compile a new kernel to squesse out a few cpu cycles from more cpu registers.

    And to the generic guys no distribution will ever compile a mainline kernel that doesnt work on your pentium4, it will work they wont check that box even if its upstream.
    It should probably also auto-detect the CPU architecture of the host build-system and throw an informational warning (not an error) if this is different because people often build custom kernels on the same machine that will be running the kernel in question.

    If distros aren't going to bother with this then why upstream it though? There are already maintained patch sets that I've been using for years. They're probably not going to go away any time soon and just need minor adjustments sometimes.

    Leave a comment:


  • erniv2
    replied
    I admit that the help text should be more complex.

    Gen1 X86-64
    Bootable on all 64bit x86 Processors

    Gen2 X86-64
    Bootable on all 64bit x86 Processors with sse4.1 4.2

    Gen3 X86-64
    Bootable on all 64bit x86 Processors with sse4.1 4.2 and avx

    And then a warning if you choose anything else than Gen1 your system may not be bootable in a virtual enviroment or you cant migrate to other cpu´s in the case of fatal hardware failure.

    The pepole maintaining systems are not nessecary idiots they probably know what they do if they compile a new kernel to squesse out a few cpu cycles from more cpu registers.

    And to the generic guys no distribution will ever compile a mainline kernel that doesnt work on your pentium4, it will work they wont check that box even if its upstream. Or they would do the windows thing buhaa.
    Last edited by erniv2; 18 September 2024, 01:02 AM.

    Leave a comment:


  • ahrs
    replied
    I've been building my own custom kernels with architecture-specific kernel optimisations for ages now using the various patches floating around to do so. I agree it's not practical for upstream inclusion. You can't have a situation where the kernel can't boot on generic hardware (unless that's exactly what the user wants and they've gone out to do so like I have). This will never be suitable for distro inclusion.

    Leave a comment:


  • xenospace
    replied
    Originally posted by betam4x View Post

    Um no. What if they are on ARM or RISC-V or are on an AMD Machine targeting Arrow Lake (which also does not implement AVX-512) or what if they are on an Arrow Lake machine, which supports AVX-10, but not AVX-512. Don’t forget about 3D Now! going away and…I could go on.
    x86-64-v3 doesn't include AVX-512, and has been fully supported on chips starting in 2013. Not *every* chip in 2013, but it's been the better part of the decade since AVX-512 has been supported for the vast majority of AMD and Intel's processor lineups.


    On a separate note:
    the benefit of x86-64-v2 and v3, and to a lesser extent right now, v4, is that they are a fairly conservative superset of x86-64, with very wide support.

    Despite being fairly conservative, when it comes to the extent of support, they do offer a pretty incredible number of additional features.

    x86-64-v2 (2012) adds: SSE3, SSSE3, SSE4.1, SSE4.2, LAHF/SAHF 64-bit
    x86-64-v3 (2013) adds: AVX, AVX2, Fused multiply-add, Bit manipulation instruction sets 1 and 2, Move with Byte Swap, ABM, and INVPCID

    and of course, v4 adds all of the much newer AVX-512 goodies.

    The benchmarks out there for these alternative feature sets are extremely sparse, and restricted to very narrow tests. The one exception is the articles here on phoronix . I spent some time looking over the many benchmark articles posted on the site recently, and all of the instances of LTO/x86-64-v2/v3 benchmarks have blown the doors off of the competition on nearly every single test thrown at them.

    These extremely dramatic results compare LTO/x86-64-v3 Clear Linux and partial-LTO/x86-64-v2 CentOS Stream kernel vs a collection of generic kernels:
    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


    However, since clear linux has its own set of performance patches, it's hard to sort out exactly how much can be chalked up to the kernels compiled with this additional tuning.

    I would love to see how cachyos's variety of kernels stacks up to the competition, as it offers a mostly unpatched kernel with lto, and x86-64-v3/v4 architecture as linux-cachyos-sched-ext-lto (the sched-ext schedulers have to be turned on explicitly, so it's fairly close to a standard kernel). And it, of course, also offers a variety of other kernels, in both lto and non-lto versions, including a realtime kernel flavor (linux-cachyos-rt-bore and linux-cachyos-bore-lto), a hardened kernel (linux-cachyos-hardened/linux-cachyos-hardened-lto), an eevdf scheduler kernel (linux-cachyos-eevdf/linux-cachyos-eevdf-lto), a server targeted kernel (linux-cachyos-server/linux-cachyos-server-lto), the standard cachy+bore+sched_ext cachy kernel (linux-cachyos/linux-cachyos-lto), and a cachy+bore and cachy+sched_ext kernel.

    The distro's kernel Kernel Manager application also offers a nice and simple interface to customize each.

    From my reading, cachy hasn't seen any benchmarking here since 2021. But it would be a great platform to use to actually compare the performance topics at hand here. (I actually finally got a phoronix subscription recently because of questions around exactly this, so I'm glad this article and thread came along to give me a platform go off xD )

    Leave a comment:


  • Summanis
    replied
    Why not have an option for building a "local kernel" that uses -march=native and leave whatever compatibility mode as the default? For beginners, it stops the risk of building a kernel that won't run and assumes any performance benefit that could come from the additional instructions.
    Presumably anyone building a kernel to distribute, who also wants the additional instructions, can do the extra steps to set environment variables.

    Leave a comment:


  • betam4x
    replied
    Originally posted by ms178 View Post

    In your example, the Intel chips would still qualify for x86-64-v3, but the user needs to be told some basics in the description to which CPU architecture this maps and obviously should know what their CPU supports. I would suggest to add "march=native" for enabling all supported instructions of the local machine. That would make use of instructions that haven't made the cut for the feature levels but are supported on the local CPU. Of course a warning should be included in the description that such a Kernel wouldn't be compatible with a wide variety of hardware and should be used for local usage or known-compatible systems only.
    Um no. What if they are on ARM or RISC-V or are on an AMD Machine targeting Arrow Lake (which also does not implement AVX-512) or what if they are on an Arrow Lake machine, which supports AVX-10, but not AVX-512. Don’t forget about 3D Now! going away and…I could go on.

    Leave a comment:


  • the-burrito-triangle
    replied


    It would seem Ice Lake, Tiger Lake and Rocket Lake all support the same AVX-512 instructions (with TGL adding the half-baked VP2INTERSECT instruction)

    And then we have Zen 4, Sapphire Rapids, Zen 5​ with added BF16 support (ZEN5 also adds an "improved" VP2INTERSECT and Saphire Rapids FP16).

    So if one ignores BF16, FP16 and VP2INTERSECT, all of these could be put under the umbrella of "AVX-512+" feature set. Which is likely better than the overly generic x86_64_v4 instruction subset.

    Any further back (Knights Landing (2016) to Cooper Lake (2020)), and its a mess of optional instructions for AVX-512 so one could only specify AVX-512F / AVX-512CD as generic support.

    Leave a comment:


  • JEBjames
    replied
    Michael

    typo

    "and has already been criticized upstream Linux kernel developers." missing word "criticized by".

    Leave a comment:


  • ms178
    replied
    Originally posted by betam4x View Post
    Great idea, flawed execution. Example: Current AMD chips support AVX-512, Intel chips do not. It is better to enable the specific features themselves based on the target chip, IMO.
    In your example, the Intel chips would still qualify for x86-64-v3, but the user needs to be told some basics in the description to which CPU architecture this maps and obviously should know what their CPU supports. I would suggest to add "march=native" for enabling all supported instructions of the local machine. That would make use of instructions that haven't made the cut for the feature levels but are supported on the local CPU. Of course a warning should be included in the description that such a Kernel wouldn't be compatible with a wide variety of hardware and should be used for local usage or known-compatible systems only.

    Leave a comment:

Working...
X