Announcement

Collapse
No announcement yet.

Proposed: Allow Building The Linux Kernel With x86-64 Microarchitecture Feature Levels

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Proposed: Allow Building The Linux Kernel With x86-64 Microarchitecture Feature Levels

    Phoronix: Proposed: Allow Building The Linux Kernel With x86-64 Microarchitecture Feature Levels

    A set of two patches posted this week would allow the Linux kernel to be easily built with the different x86-64 micro-architecture feature levels supported by the latest LLVM Clang and GCC compilers...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Wouldn't it be better to just enable or disable features like sse3 or avx?

    Usually only very small parts do profit from such features anyway. Might be feasible to include normal and optimized versions in the same build

    Comment


    • #3
      Originally posted by flower View Post
      Wouldn't it be better to just enable or disable features like sse3 or avx?

      Usually only very small parts do profit from such features anyway. Might be feasible to include normal and optimized versions in the same build
      The kernel is a tad different, AFAIK you need to manually save/restore state if you use vector registers. Thats gonna be an issue if the compiler can automatically generates such instructions, means its likely that autovectorization will have to be disabled and the feature levels will have little impact on x86_64 atleast.

      you can already manually add sse3/avx optimized routines via kconfig btw.

      Comment


      • #4
        Originally posted by discordian View Post

        The kernel is a tad different, AFAIK you need to manually save/restore state if you use vector registers. Thats gonna be an issue if the compiler can automatically generates such instructions, means its likely that autovectorization will have to be disabled and the feature levels will have little impact on x86_64 atleast.

        you can already manually add sse3/avx optimized routines via kconfig btw.
        Thanks for the clarification!
        But i have one question: as other programs can be compiled with eg sse doesn't the kernel need to save those states anyway?

        Comment


        • #5
          This is small potatoes. There's no way it can deliver similar benefits to using -march=native, because that implies -mtune=native, which using a slightly better feature level doesn't.

          Aside from a few bit-manipulation instructions, which could only make a measurable difference in a handful of micro-benchmarks, there's really not much in the higher feature-levels that can benefit the kernel.

          Comment


          • #6
            Originally posted by flower View Post
            But i have one question: as other programs can be compiled with eg sse doesn't the kernel need to save those states anyway?
            I think the point is that if we assume the kernel doesn't touch any of those registers, then there's no point in saving/restoring them on syscalls, because you know the kernel isn't going to overwrite them.

            Comment


            • #7
              IMO, the ultimate solution would be for dynamic re-optimization of individual kernel subroutines. That would have all the benefits of -march=native, -mtune=native, PGO, and potentially even LTO.

              If Linux doesn't eventually get there, some other OS will.

              Comment


              • #8
                Originally posted by flower View Post

                Thanks for the clarification!
                But i have one question: as other programs can be compiled with eg sse doesn't the kernel need to save those states anyway?
                If the kernel doesn't touch those registers and doesn't change the running thread, then no.
                The kernel has AVX optimized routines, as soon as one of these are entered the programs AVX state has to be stored and re-stored before the kernel returns to userspace.

                I am a bit unsure on x86_64 convention as sse2 is used for scalars as well. Normally anything that can have complicated sideffects is not allowed in the kernel to avoid complicated entry/exit - only a subset of sse2 fits this description. Historically there wasn't floating point allowed in the kernel either, but I think its not as clear nowadays.

                Comment


                • #9
                  Originally posted by coder View Post
                  IMO, the ultimate solution would be for dynamic re-optimization of individual kernel subroutines. That would have all the benefits of -march=native, -mtune=native, PGO, and potentially even LTO.

                  If Linux doesn't eventually get there, some other OS will.
                  compile your kernel in the initrams and kexec the new build.

                  In a way, this is already done for 32bit ARM, which dont have mandatory integer division, and the kernel will detect if it is available and patch out any calls to the software fallback. This is still alot worse than compiling it with intdiv support, because the compiler will expect a function call and cant optimize around that.

                  Comment


                  • #10
                    Originally posted by coder View Post
                    IMO, the ultimate solution would be for dynamic re-optimization of individual kernel subroutines. That would have all the benefits of -march=native, -mtune=native, PGO, and potentially even LTO.

                    If Linux doesn't eventually get there, some other OS will.
                    That seems like suggesting compiling everything into compiler bitcode and then only generates binaries when it is installed on your computer.
                    This can indeed be used with `-march=native`, `LTO` and `PGO` to optimize the generated kernel.

                    However, such support (LTO) is currently experimental and immature and only supported when using LLVM, and there isn't any attempt to accomplish such thing.

                    Comment

                    Working...
                    X