Announcement

Collapse
No announcement yet.

OpenMandriva Lx 4.0 Alpha 1 Ships With RPM4, DNF, AMD Zen Optimizations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OpenMandriva Lx 4.0 Alpha 1 Ships With RPM4, DNF, AMD Zen Optimizations

    Phoronix: OpenMandriva Lx 4.0 Alpha 1 Ships With RPM4, DNF, AMD Zen Optimizations

    While a few months back there was what ended up being a test version of the OpenMandriva Lx 4.0 Alpha, for Christmas this distribution that tracks back to Mandriva/Mandrake is out with their first official alpha release...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    This is so funny, I just saw this release on Distrowatch and was about to post about this:

    "We have also built a version specifically for current AMD processors (Ryzen, ThreadRipper, EPYC) that outperforms the generic version by taking advantage of new features in those processors."

    It's also funny because I was wondering what if some Linux vendor decided to start building vendor specifically optimized versions, like one for AMD Zen based processors and one for Intel Coffee Lake based processors.

    I wonder if the apps that are included are also built with Zen in mind and if this continues as you install software from their repositories or upgrade the system.

    Comment


    • #3
      I wonder why not more distributions do such special builds for popular architectures for maximizing performance. Even Solus or ClearLinux are not doing these. In many cases it is just as simple as changing their package build scripts with the appropriate build flags and having the ressources to distribute these packages (and finding many bugs along that way might improve the situation in the long term for all). That should be not much different than maintaining different architectures. If they go this route, I guess maximizing performance is their goal? But I wonder if they employ more than just the architecture tuning which could be done there to extract even more performance from the compiler, e.g. PGO, LTO, GRAPHITE (or in the case of Clang: POLLY) or more tuning of the Kernel config etc.

      Comment


      • #4
        Originally posted by Spooktra View Post
        This is so funny, I just saw this release on Distrowatch and was about to post about this:
        I wonder if the apps that are included are also built with Zen in mind and if this continues as you install software from their repositories or upgrade the system.
        Yes to both -- znver1 packages live in their own repository and the znver1 image upgrades from that repository.
        Our build system (ABF) automatically builds any new/updated package for generic x86_64, optimized znver1, i686, aarch64 and armv7hnl (risc-v will soon be added to ABF - as soon as we have enough base packages ready).

        Originally posted by Spooktra View Post
        one for AMD Zen based processors and one for Intel Coffee Lake
        We may well start doing that soon -- we've decided to go with Zen first because that's what most of our developers are using. If it turns out to be worth it, we'll start another build with optimizations for current Intel processors. (Until then, generic x86_64 will work there).

        Comment


        • #5
          Originally posted by ms178 View Post
          I wonder why not more distributions do such special builds for popular architectures for maximizing performance.
          We've had a bit of a discussion there before starting the Zen builds -- Arguments against it were mostly extra QA workload, lack of hardware (not everyone who wants to help us with QA has access to a Zen box), build times (building an x86_64 package and a znver1 package obviously takes about twice as much build machine time as building just an x86_64 package) and "Is it really worth it?" (given the generic version works well and is already pretty fast).

          Originally posted by ms178 View Post
          In many cases it is just as simple as changing their package build scripts with the appropriate build flags
          Indeed -- most packages don't have any CPU specific patches and just use different compiler flags. For a few packages, we've also added "%ifarch znver1" blocks in the spec file to use different configure options (e.g. disable fallback to non-SSE code, disable kernel options for Intel SOCs and other bits that don't exist in the AMD world to save some space).

          Originally posted by ms178 View Post
          That should be not much different than maintaining different architectures.
          In fact, it is easier -- in particular getting started ("cp -a x86_64 znver1" is easier and far less time consuming than having to bootstrap all packages in the right order, and you can skip over special steps like "build library 1, build library 2, build library 1 again to enable library2 support"). And a "This bug appears on znver1 but not generic x86_64" can quickly be worked around by just disabling CPU specific optimizations for the package causing them (though of course that's not the proper fix and will be used only as a temporary fix).

          Originally posted by ms178 View Post
          If they go this route, I guess maximizing performance is their goal?
          Sure -- we want the best performance we can get while keeping the system of binary packages (need to be newbie-friendly and friendly to computers that don't really have the CPU power and memory recommended to build e.g. libreoffice too...)

          Originally posted by ms178 View Post
          But I wonder if they employ more than just the architecture tuning which could be done there to extract even more performance from the compiler, e.g. PGO, LTO, GRAPHITE (or in the case of Clang: POLLY) or more tuning of the Kernel config etc.
          We build (almost) everything with LTO. We're enabling polly and graphite support in the compilers, but experiments with actually using them where performance matters most haven't shown much of a performance increase from them. Probably they're useful for some very special applications, but building e.g. mesa, Qt or bash with it doesn't seem to gain anything (in fact, we've seen some slowdowns there).
          For PGO, we're doing a bit of it where there's good ways to generate the profile, but for many things, it's hard to determine how to best generate the profiling information automatically (e.g. when trying to optimize mesa, do you run a replay of a game session? If so, what game/game engine? Or just do some desktop stuff that is coded using OpenGL? Or run a benchmark to get best performance there and just hope it applies to the real world as well? Is running a test suite a good way to get profiling information, or will that be counterproductive by optimizing for corner cases only a test suite would test?). There's definitely more that can be done there. (Hint: volunteers wanted -- please join us on #openmandriva-cooker on freenode!).

          Comment


          • #6
            berolinux Thanks a lot for commenting in depth and the insights from your point of view, I very much appreciate it and your efforts! Having architecture specific builds (for the most popular currently used architectures or - in a perfect world - for all still relevant) is the next step to get the most performance out of our systems. And as soon as I get a Ryzen as my next build, I'll try it out!

            As a bit of background, I am only a hobby user who enjoys to tinker around and wants to get the best experience out of his hardware (currently a Dual-Core Sandy Bridge notebook and 6-Core-Westmere Workstation). I am neither a software developer nor a compiler guru.

            A lot of my own Linux experiments during the last year centered on learning how to use aggressive compiler flags, how to configure and build my own Linux Kernel. My own findings contradict some old thinking which is still out there, like that "it wouldn't matter" or that "-O3 would break the world". At least on my older lower-power hardware it mattered, especially when using a slow HDD as there was quite a noticable difference in memory consumption and snapiness while opening and closing programs. The problem with building my own packages is the time and effort to maintain this level of optimization for a production system, hence I see it as more efficient if that work is done on the distro level from people who have to deal with the building of packages on a regular basis and have a proper build infrastructure in place.

            My findings also showed that compiler flags were only one part of the equation. Configuring the Kernel for performance (1000 Hz, Performance, BFQ-IO, etc.), stripping out debug symbols, unneccesary drivers (etc.) was also important. That is a bit tricky as distros typically don't want to limit themselves to a subset of users. I wonder if this could be part of an automatic detection during the installation process where the OS could detect which hardware is used at that time and could ask the user for which preferences it should setup the system (performance vs. power saving), present such a "stripped-down" Kernel as an option for the user and provide the architecture specific packages from the repo. With the relevant information even novice users could enjoy a better performing system out-of-the-box this way.

            I guess you are aware of similar efforts of the Gentoo community over here: https://github.com/InBetweenNames/gentooLTO/ - I very much like the thinking behind it. As for PGO, it seems to be tough to use as you said, at least on Firefox this article of an GCC developer indicates that PGO and training is automated and part of the build system: http://hubicka.blogspot.com/2018/12/...and-clang.html

            Maybe performance oriented projects could be lobbied to do more (testsuite for many common use cases?) to help distros to use PGO more effectively in the future? They should know their software and the needs of their users best after all.

            Comment

            Working...
            X