Announcement

Collapse
No announcement yet.

AMD Zen Scheduler Model Lands In LLVM, Makes It For LLVM 5.0

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Zen Scheduler Model Lands In LLVM, Makes It For LLVM 5.0

    Phoronix: AMD Zen Scheduler Model Lands In LLVM, Makes It For LLVM 5.0

    It was coming down to the wire for the new AMD Zen scheduler model in LLVM 5.0 but now it's managed to land just hours before the LLVM 5.0 branching...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    it will be interesting to see how much of a difference it can make (and the workloads it makes a difference in)

    Comment


    • #3
      It'd be of academic interest to see Zen performance in LLVM compiled binaries before and after this commit.

      Comment


      • #4
        Nice, let's just hope it is polished enough. Seems to me it was submitted at last minute. But, even if it's a little glitchy in some cases, that's better than having nothing. I'm sure there are plenty of applications that will work fine with it.

        To clarify, this affects how the compiled application itself performs, right? As someone who doesn't create x86 applications in C (the only C code I do is for Arduinos) I don't really understand the point of architecture-specific optimizations, since it basically requires you to use the CPU it was compiled for. So unless you made a Ryzen-specific application, wouldn't using this just hurt performance for the majority of people who aren't using Ryzen? Or is it possible to stack these, where for example you'd see performance enhancements for both Ryzen and Kaby Lake?

        Comment


        • #5
          schmidtbag
          Gentoo

          Comment


          • #6
            Originally posted by schmidtbag View Post
            Nice, let's just hope it is polished enough. Seems to me it was submitted at last minute. But, even if it's a little glitchy in some cases, that's better than having nothing. I'm sure there are plenty of applications that will work fine with it.

            To clarify, this affects how the compiled application itself performs, right? As someone who doesn't create x86 applications in C (the only C code I do is for Arduinos) I don't really understand the point of architecture-specific optimizations, since it basically requires you to use the CPU it was compiled for. So unless you made a Ryzen-specific application, wouldn't using this just hurt performance for the majority of people who aren't using Ryzen? Or is it possible to stack these, where for example you'd see performance enhancements for both Ryzen and Kaby Lake?
            Yes, this affects the compiled applications themselves. Of course, if you compile LLVM/Clang using this updated LLVM//Clang, you'll also get what may be a faster compiler as well

            Both GCC and Clang (and their c++ equivalents g++/clang++) support both compiling binaries for a specific base architecture (-march=[generic|native|znver1|others]) and also tuning towards a specific architecture while still maintaining compatibility with other CPUs (using -march=the_base_architecture -mtune=what_you_want_to_optimize_for).

            In the case of something like a distro supporting i386 and onwards, they may decide that they want to keep running on 386, but target most of their base optimizations for pentium pros and higher (i686). You could do that with: '-march=i386 -mtune=i686' or something similar.

            For my machines at home/work, I generally build LLVM, libclc (the OpenCL runtime library used by clover+radeonsi/r600g) and Mesa from source periodically. LLVM gets rebuilt every 2-3 days, and mesa is pulled from git and built as soon as I sit down at my computer after work most days. Those binaries are only ever going to be used on the machines they were compiled on, so I just build for -march=native so that I can get the best performance possible. Of course, most of those are debug builds (I'm usually working on enhancements/bugfixes to mesa or libclc themselves), which totally defeats the purpose since I have to compile with -Og as well, but the use case would still be valid for other people.

            Comment


            • #7
              Originally posted by RavFX View Post
              schmidtbagGentoo
              I actually wrote about Gentoo before submitting my post but that was kind of besides the point; it is too much of an exception.

              Originally posted by Veerappan View Post
              Yes, this affects the compiled applications themselves. Of course, if you compile LLVM/Clang using this updated LLVM//Clang, you'll also get what may be a faster compiler as well

              ... Those binaries are only ever going to be used on the machines they were compiled on, so I just build for -march=native so that I can get the best performance possible. Of course, most of those are debug builds (I'm usually working on enhancements/bugfixes to mesa or libclc themselves), which totally defeats the purpose since I have to compile with -Og as well, but the use case would still be valid for other people.
              Thanks for the clarification. But it still gets me to wonder how much it is all worth it in the end. The reason for compiling for a single architecture is to make it run as efficiently as possible, to both reduce energy and save time when running the application. But if you are compiling something for just your hardware specifically, you are expending a lot of time and energy doing so. So to me, it is a diminishing return (unless the application in question is used frequently and doesn't update often). Meanwhile, you could always distribute the binaries, but then anyone who doesn't have your hardware probably won't see an improvement, or may even experience a regression.
              Last edited by schmidtbag; 19 July 2017, 12:14 PM.

              Comment


              • #8
                Originally posted by schmidtbag View Post
                I actually wrote about Gentoo before submitting my post but that was kind of besides the point; it is too much of an exception.


                Thanks for the clarification. But it still gets me to wonder how much it is all worth it in the end. The reason for compiling for a single architecture is to make it run as efficiently as possible, to both reduce energy and save time when running the application. But if you are compiling something for just your hardware specifically, you are expending a lot of time and energy doing so. So to me, it is a diminishing return (unless the application in question is used frequently and doesn't update often). Meanwhile, you could always distribute the binaries, but then anyone who doesn't have your hardware probably won't see an improvement, or may even experience a regression.
                Remember that Zen is meant to be used in servers and enterprise situations, and it's quite valuable for some organizations to be able to compile their own code with optimizations for the server it's going to be running on. From an end-user desktop standpoint, you're right - you're either doing something like Gentoo or it probably doesn't matter.

                Comment


                • #9
                  Originally posted by Veerappan View Post

                  Yes, this affects the compiled applications themselves. Of course, if you compile LLVM/Clang using this updated LLVM//Clang, you'll also get what may be a faster compiler as well

                  Both GCC and Clang (and their c++ equivalents g++/clang++) support both compiling binaries for a specific base architecture (-march=[generic|native|znver1|others]) and also tuning towards a specific architecture while still maintaining compatibility with other CPUs (using -march=the_base_architecture -mtune=what_you_want_to_optimize_for).

                  In the case of something like a distro supporting i386 and onwards, they may decide that they want to keep running on 386, but target most of their base optimizations for pentium pros and higher (i686). You could do that with: '-march=i386 -mtune=i686' or something similar.

                  For my machines at home/work, I generally build LLVM, libclc (the OpenCL runtime library used by clover+radeonsi/r600g) and Mesa from source periodically. LLVM gets rebuilt every 2-3 days, and mesa is pulled from git and built as soon as I sit down at my computer after work most days. Those binaries are only ever going to be used on the machines they were compiled on, so I just build for -march=native so that I can get the best performance possible. Of course, most of those are debug builds (I'm usually working on enhancements/bugfixes to mesa or libclc themselves), which totally defeats the purpose since I have to compile with -Og as well, but the use case would still be valid for other people.
                  yeah, been there, done that, and one day you wanna plug your storage into another box and everything segfaults with "illegal instruction". got tired, need things to just work, build with generic optimizations now ,-) https://www.youtube.com/watch?v=457zniNGVfU did not notice a performance difference with firefox and such in real life , …

                  Comment


                  • #10
                    Originally posted by schmidtbag View Post
                    Nice, let's just hope it is polished enough. Seems to me it was submitted at last minute. But, even if it's a little glitchy in some cases, that's better than having nothing. I'm sure there are plenty of applications that will work fine with it.
                    Yep seems very last minute but that doesn't indicate outstanding issues.
                    To clarify, this affects how the compiled application itself performs, right? As someone who doesn't create x86 applications in C (the only C code I do is for Arduinos) I don't really understand the point of architecture-specific optimizations, since it basically requires you to use the CPU it was compiled for. So unless you made a Ryzen-specific application, wouldn't using this just hurt performance for the majority of people who aren't using Ryzen? Or is it possible to stack these, where for example you'd see performance enhancements for both Ryzen and Kaby Lake?
                    The phrase here is "it depends". For the most part it makes good sense to optimize for recent technology processors. That doesn't always deliver huge gains but the extended functionality of a modern processor can often lead to surprisingly large performance gains.

                    As for optimizing for a specific CPU that is valuable to the users of workstations where they are generating custom applications that need high performance. This is probably less of an issue than in the past as often you needed machine specific optimization to get passable performance. These days the value you get out of machine specific optimizations is highly variable. Hopefully Micheal will run a broad spectrum of benchmarks with before and after numbers. If he does so, we will learn where this scheduler enhancement has the biggest pay off.

                    Comment

                    Working...
                    X