Announcement

Collapse
No announcement yet.

Intel's Assembler Changes For JCC Erratum Are Not Hurting AMD

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    I'm not sure if I'm analyzing the benchmarks correctly, but looking at the very end of https://openbenchmarking.org/result/...ar+Linux+31480, there's 3% difference in geometric mean. Provided that the performance hit for Intel was said to be around 4%, this seems to be almost the same hit for AMD. I don't want to see this being applied by default anywhere. Perhaps it's time to separate distros to AMD and Intel, a bit similarly as there used to be i386 and x86_64. Mostly this would affect only binaries. Unfortunately the users have almost no power over this kind of BS on Windows.

    Here's what I have to say to Intel: I don't want your kludgy patches and workarounds and workarounds for workarounds. Pull your $h1t together and start making products that are not a security hazard to use without performance-crippling software hacks, please! Maybe move some budget over from OEM bribing and marketing to R&D and validation?

    Comment


    • #12
      Originally posted by fintux View Post
      I'm not sure if I'm analyzing the benchmarks correctly, but looking at the very end of https://openbenchmarking.org/result/...ar+Linux+31480, there's 3% difference in geometric mean.
      1.003 translates to 0.3%

      Comment


      • #13
        Originally posted by rene View Post

        no, it is not an optimization, it is a workaround for a really silly bug. If all CPUs would be so buggy and each vendor would like to have such crappy (32 byte alignment seriously) workarounds in the assembler, gcc/binutils would barely be able to still generate code at all. IMHO such crippling workaround should not be added to the toolchain at all, and vendors fix this bugs in the CPUs and microcode themselves. Maybe better ramp up their QA efforts and btw. users should demanded fixed silicon.
        Technically, it is an optimization. The bug itself is (I trust) squashed by the microcode patch. So there is no need to workaround the bug. There is a benefit to working around the fix.

        Interestingly, if you don't have the microcode patch, this optimization should avoid the bug. But it would be very difficult to be sure that the workaround was applied to every bit of code on your machine.
        Last edited by Hugh; 14 November 2019, 12:31 PM.

        Comment


        • #14
          This testing was done with Clear Linux (from Intel itself). It would be interesting to replicate it with a more neutral distro. The obvious one is Arch because it expects you to twiddle compiler options. For other distros it isn't easy to run this experiment.

          Comment


          • #15
            It would be interesting to know how much the .text size of programs is inflated by the workaround.

            Comment


            • #16
              Originally posted by atomsymbol

              Conclusion made in the article: "... the updated assembler didn't introduce any real changes for the AMD Zen 2 system."

              The conclusion in the article is invalid because in the result table about 75% of green values are on the left side.
              That table lights green/red simply based upon the outright winner regardless of if it's a 0.1% difference... Though probably should build in some sort of buffer to not light it up that way but within X threshold to remain a neutral color.
              Michael Larabel
              https://www.michaellarabel.com/

              Comment


              • #17
                Perhaps there's a statistical test you can do to say "well, no individual test is statistically worse, but if you take all of them together, is it significant?" That would give greater confidence in such assertions. It seems unlikely that it would have *zero* impact since after all the code is padded, and will likely take up more room in the cache and require more loads - but it could be a very small impact.

                Comment


                • #18
                  Originally posted by Hugh View Post

                  Technically, it is an optimization. The bug itself is (I trust) squashed by the microcode patch. So there is no need to workaround the bug. There is a benefit to working around the fix.

                  Interestingly, if you don't have the microcode patch, this optimization should avoid the bug. But it would be very difficult to be sure that the workaround was applied to every bit of code on your machine.
                  Nope, technically this is part of the workaround. If I understand correctly this phrase:
                  The issue at hand comes down to jump instructions that cross cache lines where on Skylake through Cascadelake there is the potential for "unpredictable behavior" ...
                  if you align the jump instructions to avoid 32-byte boundaries (which is the thing the GCC patch does), you don't get the issue.
                  So this GCC patch is aligning the jump instructions which crosses 32-byte boundaries to the next 32-byte boundary to avoid both the JCC erratum and the performance penalty due to the microcode update.

                  Example: if you compile your software with a patched GCC and you run on a defective processor (ie: without the updated microcode) you don't get the "unpredictable behaviour" and you don't get better performance. How this could be an optimization?

                  Comment


                  • #19
                    Originally posted by atomsymbol

                    I believe you are highly mistaken in this case.

                    Due to the nature of the changes introduced by the assembler patches, it is from theoretical viewpoint natural to expect that the difference will be about 0.1%.

                    If the actual measurement (i.e: the table in the published article) yields difference of 0.1%, which is in accordance with the theoretical expectation, then it is a proof that the assembler changes have negative impact on performance.
                    Well if you're going to argue semantics, how's this instead: The changes introduce NEGLIGIBLE performance changes with the AMD CPUs tested. The only people that would even notice are those with very long term work loads, and they can be flipped off by switches for those that do care. So your argument is practically moot. Who really gives a care if their build takes a few seconds longer to build a project, or if their web browser takes a microsecond longer to render their Facebook wall except for the most anal benchmarkists.

                    The people running HPC clusters and massive data centers where this would matter are already tuning their production systems based on internal benchmarks based on their particular need to begin with.

                    The build chain and even program code is already littered with hundreds, if not thousands, of workarounds for CPU and other hardware bugs over the many years of computing history that all impact performance to varying degrees for other CPUs that may not require them at all be it x86, ARM, SPARC, POWER, or whatever. Most of them are configurable via switch others are simply automatic conditionals. For example, there's an entire series of ARM CPUs that do not do speculative execution and are therefore immune to all Spectre class exploits. This is why generic distributions have to worry about a common use determination and hopefully make safe assumptions on security and performance trade offs. Individual circumstances can be specific which is why you have SWITCHES to turn things on and off as needed and circumstances warrant.

                    Comment


                    • #20
                      Originally posted by pyler View Post

                      Majority of software is compiled WITHOUT -march.

                      They must patch generic codegen to protect against this bug (we dont know where the resulting binary will run) too - perf loss for all.
                      Please provide benchmarks showing your claimed perf loss for all.

                      Comment

                      Working...
                      X