Announcement

Collapse
No announcement yet.

Solus Borrows From The Clear Linux Playbook For AVX2-Optimized Gaming

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Solus Borrows From The Clear Linux Playbook For AVX2-Optimized Gaming

    Phoronix: Solus Borrows From The Clear Linux Playbook For AVX2-Optimized Gaming

    One of the approaches Intel's Clear Linux distribution uses for achieving greater performance is by shipping AVX2 (and even now AVX-512) optimized libraries with their OS that are then automatically used if the detected host CPU is AVX equipped. Solus is making use of this approach now for striving for better Linux gaming performance...

    http://www.phoronix.com/scan.php?pag...AVX2-Libraries

  • #2
    Wow , after that Snap Steam move and now this ?!

    Solus devs ( Ikey and others ) seems like trying much more than SteamOS devs about gaming.

    Well , i can't play some of Feral ports ( Shadow of Mordor , Tomb Raider ) on my laptop which is really strong to run them ( Gtx 1050 ) but some of Feral ports ( Shogun 2 , Medieval 2 ) working without problems.

    I'm eager to move to Solus once i've got time.

    Comment


    • #3
      The only problem I have with Solus right now is that they still don't ship an image with KDE and somehow installing 3rd party applications with their store didn't work for me last time I tested it (but I don't know why and I didn't fill a bug report. I know, ok, but I didn't fill one.)

      Otherwise it was an excellent experience.

      Comment


      • #4
        Profile first. From what I've read, AVX feels is extremely inappropriate for desktop usage: https://blog.cloudflare.com/on-the-d...uency-scaling/

        But hey, I haven't profiled it myself so it's just a guess.

        Comment


        • #5
          Originally posted by c117152 View Post
          Profile first. From what I've read, AVX feels is extremely inappropriate for desktop usage: https://blog.cloudflare.com/on-the-d...uency-scaling/

          But hey, I haven't profiled it myself so it's just a guess.
          I've not been able to find reliable information on AMD Ryzen's implementation of AVX2 with in real desktop performance situations to see if there's a similar problem on AMD AVX2 utilization. Ryzen has a quirky implementation of AVX2 so it may or may not help to use it for some tasks. This is relevant to me as I have a Ryzen 5.

          I read that blog from Cloudflare with great interest though. It's a good cautionary tale about the dangers of implementing the New Hotness without fully understanding the actual real world impact of New Hotness on Old Busted -that would be your mundane stable base and already proven to work for your intended use. Implementing $new_feature doesn't always equal a performance increase like Cloudflare has discovered. Sometimes implementation returns are counter-intuitive on the surface while only making sense once you dig into the guts, like frequency scaling killing an otherwise potentially useful feature.

          Comment


          • #6
            Originally posted by stormcrow View Post
            frequency scaling killing an otherwise potentially useful feature.
            Seems they had little choice in the matter. As x86 CISC instructions get decoded to RISC (micro)instructions just-in-time, their corresponding execution units are few and can usually be throttled to avoid thermal issues and even overclock a little. While clever and cool sounding, this "frequency scaling" technique is actually a hack around Intel's failure to deliver wider pipelines and it depends on them not normally taking advantage of most of their silicon. So, by using a SMID instruction that does actually use a lot of execution units, they've encountered thermal issues where they can no longer throttle frequencies. If all you're doing is AVX then it's going to pay off in throughput over time. But from what I can tell, if a desktop user does 20% decryption / decompression using AVX and concurrently 80% parsing and table missing while pulling web pages with their browser, not only will they lose in latency, they're also going to lose in throughput.

            Also, this is complete speculation but seeing how we're a few years into AVX, I'm guessing this problem isn't going anywhere anytime soon. My thinking is that since Intel optimizes their cores for single-core integer computation, it's reasonable to assume they can't just move around execution units to satisfy AVX needs without serious downsides.

            But again, it's just a guess without proper profiling.

            Comment


            • #7
              Originally posted by c117152 View Post
              Seems they had little choice in the matter. As x86 CISC instructions get decoded to RISC (micro)instructions just-in-time, their corresponding execution units are few and can usually be throttled to avoid thermal issues and even overclock a little. While clever and cool sounding, this "frequency scaling" technique is actually a hack around Intel's failure to deliver wider pipelines and it depends on them not normally taking advantage of most of their silicon. ...

              If all you're doing is AVX then it's going to pay off in throughput over time. But from what I can tell, if a desktop user does 20% decryption / decompression using AVX and concurrently 80% parsing and table missing while pulling web pages with their browser, not only will they lose in latency, they're also going to lose in throughput...
              I think we're on the same page. More information is needed with profiling results based on Solus' average user base rather than just a few from the developers' own experiences. I wasn't down on Intel's implementation of AVX-512, they have reasons for doing it that way that fit within their hardware designs. Like you I'm simply not convinced the feature is useful on desktops, hence my comment. Hard evidence is needed before implementing a feature distro wide.

              Comment


              • #8
                Originally posted by stormcrow View Post

                Hard evidence is needed before implementing a feature distro wide.
                Which is why it hasn't been implemented distro wide, it's only been turned on within the Snap runtime, which uses a modified glibc package, precisely so that we can test it
                against the stock glibc to see how it holds up. If it does actually prove itself, then we would merge that in Solus, otherwise we'd drop it from the runtime. It's taking the whole
                idea of sandboxing a little bit too literally :P

                Comment


                • #9
                  It makes sense to compile some pieces of software that depend on CPU performance to take advantage of all of your CPU's features - e.g compression/decompression, image/audio/video decoding/encoding, kernel, text rendering software, C runtime etc.

                  Comment


                  • #10
                    Originally posted by ikey_solus View Post

                    Which is why it hasn't been implemented distro wide, it's only been turned on within the Snap runtime, which uses a modified glibc package, precisely so that we can test it
                    against the stock glibc to see how it holds up. If it does actually prove itself, then we would merge that in Solus, otherwise we'd drop it from the runtime. It's taking the whole
                    idea of sandboxing a little bit too literally :P
                    My apologies. I didn't realize this wasn't a full roll out across the distro rather a testing. Good that you're dipping your toe in rather than diving in head first. I would be very interested in the results AVX2 in a multitask desktop environment (as in desktop oriented tasks running at the same time: web browser, e'mail client, OpenGL/Vulkan game in a window, voice chat on Discord, etc, opposed to a desktop primarily running a single task like a synthetic benchmark) if there's any real world performance regressions for either Intel or AMD in the vein of what Cloudflare discovered in their servers. The only reports I've seen don't tell the full story on the effects of various AVX* implementations in desktop use, only in specific non-interactive use cases. I've seen Blender benchmarks, SiSense benchmarks, and so on, but interactive responsiveness is difficult to quantify unless your desktop is taking seconds to foreground or move a window.

                    Comment

                    Working...
                    X