Announcement

Collapse
No announcement yet.

LLVM Clang Shows Off Great Performance Advantage On NVIDIA GH200's Neoverse-V2 Cores

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    2x difference on sharpen really looks like autovectorization issue. AFAIR unsharp mask does gaussian blur and image addition, something SIMD extensions are designed for.

    But on the other hand, it would be strange GCC fails on such simple task.
    Last edited by sobrus; 19 March 2024, 05:09 AM.

    Comment


    • #12
      Originally posted by mlau View Post
      Or it's a gcc-13 issue in this case (missed vectorization perhaps). Maybe Michael has time to redo this test with clang-18 and gcc-14.
      If this was the case wouldn't all benchmarks within a given software show the same discrepancies?

      For instance, not all the Graphics Magick benchmarks show such a huge difference between the two compilers, in some the difference is negligible.

      This tells me that the developers missed some optimizations within that functionality.

      Comment


      • #13
        Originally posted by coder View Post
        It could even be something like cache management, where you might not see much difference on single-threaded benchmarks, but which really matter in a 72-core config.
        If you look at the Liquid DSP benchmarks closely you will note that even with only 1 thread there are times where Clang significantly outperforms GCC.

        I don't blame the compiler, the job of a compiler is to take the code that it is given and convert it to something the processor can execute.

        It is the job of the programmer to write efficient code.

        Imagine if there were no other compiler or if tests like this didn't exist and you relied on a program like the ones tested, at some point you would decide that you needed a faster computer and go out and spend large amounts of cash just to get the performance that you could have had if only the programmers knew what they were doing and actually cared.

        As i have mentioned I do a lot of Python programming, as a hobby and as a way of increasing my knowledge and skill level and I focus on scientific calculations, mostly chemical analysis and gravitational sphere of influence, orbital mechanics, that sort of thing.

        I always write the basic code in single threaded fashion, just to get the baseline and making sure the algorithms are returning the correct values, I always include sanity checking code to make sure the results make sense and I include timing logic to see how long a particular algorithm takes to finish.

        After, i make different versions using different optimization techniques to see how much speed up I can get out of the restructuring the code and when i have reached the point where i don't think i can get any more speed out of it, I port the original to C, C#, FORTRAN. Rust and Go to compare the overall speed.

        The reason i bring this up is because this process has shown me just how much performance a programmer can leave on the table if he just writes generic code that gets the job done and calls it a day.

        There are functions where I see a 30x speed up by spending the time to restructure the code to make it more efficient.

        It drives me up the wall thinking about all the people that will eventually become convinced they need newer, faster hardware when what they needed was software written by better programmers.

        And for the record this goes for closed source and open source programs, as I am sure there's closed source software that is probably coded in the laziest manner possible.

        Comment


        • #14
          Originally posted by drakonas777 View Post
          Not necessarily. Performance optimized code is very often hard to read, sometimes borderline unreadable. Developers may have chosen to honor clarity instead maximum performance. Not to mention that optimized code is also less portable.
          My first class in computer science was back in the early 80's, on Apple IIe computers using Apple BASIC. Then in high school we had IBM's where I took BASIC, FORTRAN, Pascal and COBOL and the last 3 classes where taught by the same guy.

          He drilled 2 things into our skulls, don't use GOTO statements, if you used them it was an automatic F for the assignment, he said he hated "spaghetti" code.

          The other thing was comment the hell out of your code. He would tell us it was not his job to figure out what our code was supposed to do.

          To this day, when i write something, every function has beginning and ending comments, such as //Start function that does something and //End of function that does something and within i will include comments as well.

          There is stuff I wrote years ago that I can look at and pick up were i left off.

          A good programmer can optimize code with handcrafted assembler that is still easy to understand.

          As for portability, I think getting a massive speedup is worth the tradeoff, they can write multiple code paths,

          Comment


          • #15
            Originally posted by sophisticles View Post

            My first class in computer science was back in the early 80's, on Apple IIe computers using Apple BASIC. Then in high school we had IBM's where I took BASIC, FORTRAN, Pascal and COBOL and the last 3 classes where taught by the same guy.

            He drilled 2 things into our skulls, don't use GOTO statements, if you used them it was an automatic F for the assignment, he said he hated "spaghetti" code.

            The other thing was comment the hell out of your code. He would tell us it was not his job to figure out what our code was supposed to do.

            To this day, when i write something, every function has beginning and ending comments, such as //Start function that does something and //End of function that does something and within i will include comments as well.

            There is stuff I wrote years ago that I can look at and pick up were i left off.

            A good programmer can optimize code with handcrafted assembler that is still easy to understand.

            As for portability, I think getting a massive speedup is worth the tradeoff, they can write multiple code paths,
            Your subjective life experience is a non-argument. Also it has nothing to do with the performance optimizations and it's sort of incorrect even:
            - goto statement is used regularly in low level C code like firmwares, bootloaders, OS modules, system components etc. Mostly for handling several different cleanup paths in the same function. Goto is discouraged to be used as an 'if' replacement and rightly so, but it does have a valid usage pattern in some niche cases;
            - comments should answer 'why'. Not what. If 'what' is not obvious from the code itself, then it's a shit code.

            Assembly is a lot harder to understand, because we are not living in 80s anymore: nowadays instruction sets are huge with lots of uarch specific nuances. Your implied expectation that a "good programmer" should be able to handcraft better assembly than a modern compiler is an unrealistic unpractical nonsense. There is one exception though: unless we are talking specifically about performance engineer/compiler engineer/OS engineer, who specializes in this area. This whole notion that "a good C/C++ coder just must, must know every detail of compiler and assembly" is some elitist bullshit.

            As for clarity/perf tradeoff - depends on the speedup. I partly agree when we are talking 2x, but as i said it's an outlier. Usually it's far less than that.
            Last edited by drakonas777; 20 March 2024, 08:47 AM.

            Comment

            Working...
            X