Announcement

Collapse
No announcement yet.

AMD Zen Scheduler Model Lands In LLVM, Makes It For LLVM 5.0

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Xelix
    replied
    Originally posted by schmidtbag View Post
    I actually wrote about Gentoo before submitting my post but that was kind of besides the point; it is too much of an exception.


    Thanks for the clarification. But it still gets me to wonder how much it is all worth it in the end. The reason for compiling for a single architecture is to make it run as efficiently as possible, to both reduce energy and save time when running the application. But if you are compiling something for just your hardware specifically, you are expending a lot of time and energy doing so. So to me, it is a diminishing return (unless the application in question is used frequently and doesn't update often). Meanwhile, you could always distribute the binaries, but then anyone who doesn't have your hardware probably won't see an improvement, or may even experience a regression.
    What you say is (mostly) true for most end-users, who typically use packages from their distribution, which are compiled for a common denominator so that it can run on most machines.

    I work for a company that writes performance-critical applications. We do not release these applications, and use them to compete against other companies doing the same thing. In our field, being a few nanoseconds faster or slower than our competitors can make the difference between making a lot of money or losing a lot of money. So we use all the tricks in the book to squeeze out as much performance as we can, including choosing very specific compiler flags .

    Another use-case would be HPC applications. They are typically written to run on clusters, and the programmer will most likely compile with custom flags for this cluster's CPUs

    Leave a comment:


  • wizard69
    replied
    Originally posted by schmidtbag View Post
    Nice, let's just hope it is polished enough. Seems to me it was submitted at last minute. But, even if it's a little glitchy in some cases, that's better than having nothing. I'm sure there are plenty of applications that will work fine with it.
    Yep seems very last minute but that doesn't indicate outstanding issues.
    To clarify, this affects how the compiled application itself performs, right? As someone who doesn't create x86 applications in C (the only C code I do is for Arduinos) I don't really understand the point of architecture-specific optimizations, since it basically requires you to use the CPU it was compiled for. So unless you made a Ryzen-specific application, wouldn't using this just hurt performance for the majority of people who aren't using Ryzen? Or is it possible to stack these, where for example you'd see performance enhancements for both Ryzen and Kaby Lake?
    The phrase here is "it depends". For the most part it makes good sense to optimize for recent technology processors. That doesn't always deliver huge gains but the extended functionality of a modern processor can often lead to surprisingly large performance gains.

    As for optimizing for a specific CPU that is valuable to the users of workstations where they are generating custom applications that need high performance. This is probably less of an issue than in the past as often you needed machine specific optimization to get passable performance. These days the value you get out of machine specific optimizations is highly variable. Hopefully Micheal will run a broad spectrum of benchmarks with before and after numbers. If he does so, we will learn where this scheduler enhancement has the biggest pay off.

    Leave a comment:


  • rene
    replied
    Originally posted by Veerappan View Post

    Yes, this affects the compiled applications themselves. Of course, if you compile LLVM/Clang using this updated LLVM//Clang, you'll also get what may be a faster compiler as well

    Both GCC and Clang (and their c++ equivalents g++/clang++) support both compiling binaries for a specific base architecture (-march=[generic|native|znver1|others]) and also tuning towards a specific architecture while still maintaining compatibility with other CPUs (using -march=the_base_architecture -mtune=what_you_want_to_optimize_for).

    In the case of something like a distro supporting i386 and onwards, they may decide that they want to keep running on 386, but target most of their base optimizations for pentium pros and higher (i686). You could do that with: '-march=i386 -mtune=i686' or something similar.

    For my machines at home/work, I generally build LLVM, libclc (the OpenCL runtime library used by clover+radeonsi/r600g) and Mesa from source periodically. LLVM gets rebuilt every 2-3 days, and mesa is pulled from git and built as soon as I sit down at my computer after work most days. Those binaries are only ever going to be used on the machines they were compiled on, so I just build for -march=native so that I can get the best performance possible. Of course, most of those are debug builds (I'm usually working on enhancements/bugfixes to mesa or libclc themselves), which totally defeats the purpose since I have to compile with -Og as well, but the use case would still be valid for other people.
    yeah, been there, done that, and one day you wanna plug your storage into another box and everything segfaults with "illegal instruction". got tired, need things to just work, build with generic optimizations now ,-) https://www.youtube.com/watch?v=457zniNGVfU did not notice a performance difference with firefox and such in real life , …

    Leave a comment:


  • smitty3268
    replied
    Originally posted by schmidtbag View Post
    I actually wrote about Gentoo before submitting my post but that was kind of besides the point; it is too much of an exception.


    Thanks for the clarification. But it still gets me to wonder how much it is all worth it in the end. The reason for compiling for a single architecture is to make it run as efficiently as possible, to both reduce energy and save time when running the application. But if you are compiling something for just your hardware specifically, you are expending a lot of time and energy doing so. So to me, it is a diminishing return (unless the application in question is used frequently and doesn't update often). Meanwhile, you could always distribute the binaries, but then anyone who doesn't have your hardware probably won't see an improvement, or may even experience a regression.
    Remember that Zen is meant to be used in servers and enterprise situations, and it's quite valuable for some organizations to be able to compile their own code with optimizations for the server it's going to be running on. From an end-user desktop standpoint, you're right - you're either doing something like Gentoo or it probably doesn't matter.

    Leave a comment:


  • schmidtbag
    replied
    Originally posted by RavFX View Post
    schmidtbagGentoo
    I actually wrote about Gentoo before submitting my post but that was kind of besides the point; it is too much of an exception.

    Originally posted by Veerappan View Post
    Yes, this affects the compiled applications themselves. Of course, if you compile LLVM/Clang using this updated LLVM//Clang, you'll also get what may be a faster compiler as well

    ... Those binaries are only ever going to be used on the machines they were compiled on, so I just build for -march=native so that I can get the best performance possible. Of course, most of those are debug builds (I'm usually working on enhancements/bugfixes to mesa or libclc themselves), which totally defeats the purpose since I have to compile with -Og as well, but the use case would still be valid for other people.
    Thanks for the clarification. But it still gets me to wonder how much it is all worth it in the end. The reason for compiling for a single architecture is to make it run as efficiently as possible, to both reduce energy and save time when running the application. But if you are compiling something for just your hardware specifically, you are expending a lot of time and energy doing so. So to me, it is a diminishing return (unless the application in question is used frequently and doesn't update often). Meanwhile, you could always distribute the binaries, but then anyone who doesn't have your hardware probably won't see an improvement, or may even experience a regression.
    Last edited by schmidtbag; 19 July 2017, 12:14 PM.

    Leave a comment:


  • Veerappan
    replied
    Originally posted by schmidtbag View Post
    Nice, let's just hope it is polished enough. Seems to me it was submitted at last minute. But, even if it's a little glitchy in some cases, that's better than having nothing. I'm sure there are plenty of applications that will work fine with it.

    To clarify, this affects how the compiled application itself performs, right? As someone who doesn't create x86 applications in C (the only C code I do is for Arduinos) I don't really understand the point of architecture-specific optimizations, since it basically requires you to use the CPU it was compiled for. So unless you made a Ryzen-specific application, wouldn't using this just hurt performance for the majority of people who aren't using Ryzen? Or is it possible to stack these, where for example you'd see performance enhancements for both Ryzen and Kaby Lake?
    Yes, this affects the compiled applications themselves. Of course, if you compile LLVM/Clang using this updated LLVM//Clang, you'll also get what may be a faster compiler as well

    Both GCC and Clang (and their c++ equivalents g++/clang++) support both compiling binaries for a specific base architecture (-march=[generic|native|znver1|others]) and also tuning towards a specific architecture while still maintaining compatibility with other CPUs (using -march=the_base_architecture -mtune=what_you_want_to_optimize_for).

    In the case of something like a distro supporting i386 and onwards, they may decide that they want to keep running on 386, but target most of their base optimizations for pentium pros and higher (i686). You could do that with: '-march=i386 -mtune=i686' or something similar.

    For my machines at home/work, I generally build LLVM, libclc (the OpenCL runtime library used by clover+radeonsi/r600g) and Mesa from source periodically. LLVM gets rebuilt every 2-3 days, and mesa is pulled from git and built as soon as I sit down at my computer after work most days. Those binaries are only ever going to be used on the machines they were compiled on, so I just build for -march=native so that I can get the best performance possible. Of course, most of those are debug builds (I'm usually working on enhancements/bugfixes to mesa or libclc themselves), which totally defeats the purpose since I have to compile with -Og as well, but the use case would still be valid for other people.

    Leave a comment:


  • RavFX
    replied
    schmidtbag
    Gentoo

    Leave a comment:


  • schmidtbag
    replied
    Nice, let's just hope it is polished enough. Seems to me it was submitted at last minute. But, even if it's a little glitchy in some cases, that's better than having nothing. I'm sure there are plenty of applications that will work fine with it.

    To clarify, this affects how the compiled application itself performs, right? As someone who doesn't create x86 applications in C (the only C code I do is for Arduinos) I don't really understand the point of architecture-specific optimizations, since it basically requires you to use the CPU it was compiled for. So unless you made a Ryzen-specific application, wouldn't using this just hurt performance for the majority of people who aren't using Ryzen? Or is it possible to stack these, where for example you'd see performance enhancements for both Ryzen and Kaby Lake?

    Leave a comment:


  • sykobee
    replied
    It'd be of academic interest to see Zen performance in LLVM compiled binaries before and after this commit.

    Leave a comment:


  • boxie
    replied
    it will be interesting to see how much of a difference it can make (and the workloads it makes a difference in)

    Leave a comment:

Working...
X