Announcement

Collapse
No announcement yet.

Intel's Linux OS Shows The Importance Of Software Optimizations, Further Optimized Xeon "Ice Lake" In 2021

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel's Linux OS Shows The Importance Of Software Optimizations, Further Optimized Xeon "Ice Lake" In 2021

    Phoronix: Intel's Linux OS Shows The Importance Of Software Optimizations, Further Optimized Xeon "Ice Lake" In 2021

    As part of the various end-of-year Linux comparisons that I've made a habit of over the past 17 years, with the EOY 2021 benchmarking I was rather curious to see how Intel's Clear Linux distribution has evolved Xeon Scalable "Ice Lake" performance since that platform launched in Q2'2021. It turns out there have been some terrific optimizations squeezed out of that latest-generation Xeon Scalable platform on Intel's Clear Linux. In this article is a look at the Ubuntu and Clear Linux performance on the flagship Xeon Platinum 8380 2P reference server back around the time Ice Lake launched and then again using the latest software packages that closed out 2021.

    https://www.phoronix.com/vr.php?view=30837

  • #2
    At this point, and with so many differences between processors and GPUs put on use, it would be a good idea to distros to have special packages on intermediate representation, IR for llvm or RTL for gcc. On this case, for some of us that really want or need very optimized packages, we could ask the package manager to generate optimized versions for specific sets, and do it automatically on updates.

    For openSUSE, for example, it would be:
    # zypper optimize kernel glibc

    And it would download kernel-(something).irrpm and glibc-(something).irrpm, and start the latest steps of compiler optimizations, without having to parse the whole source (and the sources in tree dependencies) again.

    Comment


    • #3
      Thanks for this content. I'm definitely following the progress of Intel's Clear Linux. Nice to see they are continuing to find ways to improve the perfomance even further.

      Comment


      • #4
        Originally posted by acobar View Post
        At this point, and with so many differences between processors and GPUs put on use, it would be a good idea to distros to have special packages on intermediate representation, IR for llvm or RTL for gcc. On this case, for some of us that really want or need very optimized packages, we could ask the package manager to generate optimized versions for specific sets, and do it automatically on updates.

        For openSUSE, for example, it would be:
        # zypper optimize kernel glibc

        And it would download kernel-(something).irrpm and glibc-(something).irrpm, and start the latest steps of compiler optimizations, without having to parse the whole source (and the sources in tree dependencies) again.
        It's called Gentoo.

        Your idea might work, but I would imagine it would be a nightmare for support. Perhaps a distro which compiled all packages to IR then at runtime a cached loader could compile the binaries JIT for the CPU feature set. That way it could still be "portable", which is a major (valid) complaint about Gentoo.

        Comment


        • #5
          Originally posted by s_j_newbury View Post

          It's called Gentoo.

          Your idea might work, but I would imagine it would be a nightmare for support. Perhaps a distro which compiled all packages to IR then at runtime a cached loader could compile the binaries JIT for the CPU feature set. That way it could still be "portable", which is a major (valid) complaint about Gentoo.
          I don't think so, as after parsing, syntax analysis, semantic analysis, AST and symbols generation are, pretty much, done, what is left is mostly optimizations and linking. This is what the steps to process irrpm would have to cope with.

          For what I know, Gentoo deals with all the above, what do make the effort to build specialized versions far more time and computing intensive, even more if we take in account that the dependencies may have to be optimized, what is common.

          Perhaps, the JIT binaries would be a good idea, but I don't see this as the best solution because from thousands of packages, only some a few really needs to be optimized and, as so, the effort saved on not doing the JIT step seems huge.

          Comment


          • #6
            Thats amazing.


            But I have to wonder how much of that comes from the performance governor? That might be an issue if the system idles or handles really light loads frequently.


            IMO the next Clear linux bench should include a section where both OSes are forced to use the same governor.

            Comment


            • #7
              I would like to see distros start offering multiple versions (or using that new hwcaps feature or what it was called) of some key packages. I can see why they don't want to compile everything multiple times, and frankly most software doesn't need it (ls is not exactly performance sensitive for example).

              It seems to me that things like glibc, libstdc++* and other language runtimes would benefit from it the most (due to being used in so many places) and also packages with heavy computation (for example video codecs, blas/lapack, numpy).

              I'm not aware of any distro doing that though. All I have heard are all or nothing discussions, but maybe I missed something.

              * Libstdc++ might not get as much benefit though due to C++ templates effectively resulting in inlining.

              Comment


              • #8
                It's important to understand why Clear Linux is faster.

                One of the things that they do is deliberately opt to compile software using faster floating point handling, vs IEEE standard correct floating point handling. While it's probably not too dangerous, it is a risk, and it's why these flags are disabled by default with a compiler, and on distributions. Care needs to be taken with their use. Software compiled with them enabled may expose undesired behaviour, and depending on the software that stretches everywhere from "whatever" to "fundamentally dangerous"

                All of the Phoronix coverage just tends to focus on the performance benchmarks, but never really looks at the why, and what trade-offs are being made. These kinds of details are extremely important.

                To take an example from another side of tech, MongoDB picked up a reputation as an extremely fast data store. It did this largely because the defaults it used had a much lower level of data durability compared to most database, and places found this out the hard way. MongoDB didn't care, they had benchmarks that looked awesome! Enable the same level of durability and performance dropped significantly and they were at best equal performance with the more traditional database markets.
                Garp
                Junior Member
                Last edited by Garp; 06 January 2022, 03:00 PM.

                Comment


                • #9
                  Originally posted by Vorpal View Post
                  ...and also packages with heavy computation (for example video codecs, blas/lapack, numpy).
                  Those apps tend to use assembly and/or other handmade optimizations in the most critical parts of the code, which compiler optimizations would not change.

                  But you are still not wrong! Sometimes the coverage is not very complete. And if you're running on, say, ARM, you might not have much hand optimized code to use.



                  Personally, I think language VMs like Java, Python, or Go are the perfect use case for repo optimized packages. They are way too big to comprehensively hand optimize, and very frequently used.
                  brucethemoose
                  Phoronix Member
                  Last edited by brucethemoose; 06 January 2022, 03:13 PM.

                  Comment


                  • #10
                    Originally posted by Garp View Post
                    It's important to understand why Clear Linux is faster.

                    One of the things that they do is deliberately opt to compile software using faster floating point handling, vs IEEE standard correct floating point handling. While it's probably not too dangerous, it is a risk, and it's why these flags are disabled by default with a compiler, and on distributions. Care needs to be taken with their use. Software compiled with them enabled may expose undesired behaviour, and depending on the software that stretches everywhere from "whatever" to "fundamentally dangerous"
                    I couldn't find anything about that in their documentation, could you please provide a source for this claim? Which flags, that they enable, do you consider dangerous?

                    Comment

                    Working...
                    X