Announcement

Collapse
No announcement yet.

Targeted Intel oneAPI DPC++ Compiler Optimization Rules Out 2k+ SPEC CPU Submissions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by sophisticles View Post

    I think it's a shame that a guy with a Master's degree in Computer Science, that leads development of a project with such a wide reach and makes millions a year, thinks that floating point is such a special use case that no one cares about.

    I guess mathematicians, scientists, analysts, economists and business people don't count.
    Let's put a bit of context to it: Linus said it in a time when AVX512, frequently, caused the CPU to overheat, and as so, triggered thermal throttling, lowering by a big chunk the system performance. Intel implementation was, somehow, flawed, but seems to be better in the new, and quite expensive, special parts now.

    Linus critics were merited then, but things change as time goes by.

    Comment


    • #12
      Originally posted by acobar View Post

      Let's put a bit of context to it: Linus said it in a time when AVX512, frequently, caused the CPU to overheat, and as so, triggered thermal throttling, lowering by a big chunk the system performance. Intel implementation was, somehow, flawed, but seems to be better in the new, and quite expensive, special parts now.

      Linus critics were merited then, but things change as time goes by.
      Other more generally useful improvements could have potentially been made instead of optimizing design for AVX512. There's no free lunch. There is a generally held view amongst many "programmers" that FP is how maths is done, which is wrong. FP is how maths is approximated.

      Comment


      • #13
        Originally posted by pong View Post
        I would not be surprised to see NVIDIA pull an Apple and just stick some general ARM/RISCV cores or user-equivalent execution capability into their GPU dies in a generation or two and just call that a GPGPU computer, forget x86, forget traditional motherboard form factors and the GPU being a "peripheral" to some lame CPU.
        They did just that, see a recent Phoronix article on NVIDIA’s GH200 (Grace Hopper).

        Comment


        • #14
          Originally posted by osw89 View Post

          Leave talking about hardware to actual EEs since you obviously don't even know the basics. All modern CPUs have SIMD capable integer and FP units, not magic SIMD units that do math and somehow replace the FPU. You implement SIMD by having multiple instances of data processing blocks like adders and multipliers in your execution units. You see those blocks labeled "floating point"? Those make up the FPU that you claim doesn't exist anymore and it doesn't matter whether it's a separate chip or on the same die, it's still an FPU. There are multiple MUL/ADD/ALUs to enable SIMD. SIMD is a feature of FP/integer units, calling an FPU an SIMD unit is like calling a GPU an h264 unit since it can decode and encode h264.
          AMD-Zen-2-vs-Zen-3.png
          So you're an electrical engineer?

          May want to take a refresher course:

          https://en.wikipedia.org/wiki/Floati...urrent%20archi tectures%2C%20the,newer%20Intel%20and%20AMD%20proc essors.

          In some current architectures, the FPU functionality is combined with SIMD units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors.




          Intel has actually created three separate generations of floating-point hardware for the x86. They started with a rather hideous stack-oriented FPU modeled after a pocket calculator in the early 1980's, started over again with a register-based version called SSE in the 1990's, and have just recently created a three-input extension of SSE called AVX.
          I know for a fact that Intel since the introduction of the Pentium III has combined the FPU with the SSE until and one of the reasons why Intel used to tell developers that floating point operations were deprecated and SIMD should be used whenever possible.

          I could have sworn that AMD had followed suit decades ago but maybe they didn't and maybe that's why many people used to avoid AMD processors for scientific workloads.

          It could also be why there were applications in the late 90's that would only run on Intel processors.​

          Comment


          • #15
            Originally posted by sophisticles View Post

            You haven't ever taken a computer architecture class, have you?

            I take it Mr. "Developer12" (were "Developer1-11" already taken?) that you are unaware that modern x86 CPU's do not have floating point units.

            All of AMD's and Intel's current processors use the SIMD units for x87, aka floating point, math.

            You may also want to recalculate your "100% of the population uses integer" conclusion but in an ironic twist that only people with functioning brains could have seen coming, you need floating point math to do so.

            Thanks for playing.
            Right off to the rude comments are you?

            You have no idea how modern processors implement floating point, and even less idea what CPU designers (like myself) are referring to when they say "integer." Go have a look at my previous comment for a brief list of example instruction classes. Damn near every instruction a CPU executes relies on on the integer pipelines, and FP-heavy code is no exception. Even in the best case a significant number of instructions are not FP.

            Have a look at the block diagram for how a modern, superscalar x86 CPU works. Zen is a great example. You'll see the usual instruction ingest and decoding, followed by a large scheduling and reorder buffer that feeds into a series of integer, floating point, and sometimes branch pipelines. The instructions are issued to these units in parallel as they become available and as the instructions' dependencies become resolved. When instructions are complete they pass out of the end of these parallel units and are retired, which means the results are committed back to memory or software-visible registers as-needed.

            Yes, you can overdose on tons of parallel FPU pipeline units, if you want really amazing FP performance (and extensions like AVX512 demand it), but you're not going to see much improvement on most workloads. Every single workload under the sun, meanwhile, has to pass through the integer pipelines at some point or another. You can't write code, even FP-heavy code, without loop indexing (which requires integer addition and comparison) and branches (which may or may not be handled by the integer pipelines, depending on the microarchitecture). you also can't go without various load/store instructions, since you need to actually have the numbers your FP is operating on. These in turn often rely on integer math for indexing/address calculation (especially on x86!).

            There's also a pretty quick drop-off in additional performance as you keep adding FP pipelines, because there's only so much ILP (instruction-level parallelism) that you can extract. Extensions like AVX 512 can help with this to some degree by making it easier to express big, wide FP operations (yay "SIMD"), but at the end of the day there's precious little in the way of FP workloads that can benefit form wider and wider FP instruction sets that wouldn't see MUCH better performance on a GPU. Ultimately, it makes more sense to run such heavy FP code on a hardware device (a GPU) spcialized for that task. You'll see much wider throughput and much lower latency, without impacting everyone else including yourself.

            Do some research and then maybe you can find valid criticisms. I spent a hell of a lot of years studying and implementing this stuff when I got my (multiple!) degrees in this area.
            Last edited by Developer12; 10 February 2024, 04:22 PM.

            Comment


            • #16
              Originally posted by Kabbone View Post

              He might be trolling but I don't think you know what you are talking about. For engineering you need massive FP performance and not every code is optimized or suited for a GPU
              Even in very FP-heavy code you still encounter a lot of integer instructions, and improving those makes *everyone* faster. These days a modern CPU can do FP fast enough with only a modest number of FP units that it becomes a moot point for most software, even engineering stuff like matlab.

              The whole reason intel brought in AVX512 is because they want to market their CPUs (having really no GPUs to speak of yet) as being suitable for AI workloads in some way, even if it's just inference. The alternative is to cede market share to AMD/nvidia, as well as loose out on potential stock value gains by not having an answer to the AI boom.

              Comment


              • #17
                Originally posted by sophisticles View Post

                So you're an electrical engineer?

                May want to take a refresher course:

                https://en.wikipedia.org/wiki/Floati...urrent%20archi tectures%2C%20the,newer%20Intel%20and%20AMD%20proc essors.









                I know for a fact that Intel since the introduction of the Pentium III has combined the FPU with the SSE until and one of the reasons why Intel used to tell developers that floating point operations were deprecated and SIMD should be used whenever possible.

                I could have sworn that AMD had followed suit decades ago but maybe they didn't and maybe that's why many people used to avoid AMD processors for scientific workloads.

                It could also be why there were applications in the late 90's that would only run on Intel processors.​
                Are you joking? You're basing all of this on a wikipedia article on computer components from 3 decades ago? That minor blurb about modern hardware in an article focused on hardware from the 80's and 90's could be called a gross aproximation/simplification *at best.*

                You even left on the highlight option in the URL from when you hurriedly googled that link.
                Last edited by Developer12; 10 February 2024, 04:19 PM.

                Comment


                • #18
                  Originally posted by Developer12 View Post
                  You have no idea how modern processors implement floating point, and even less idea what CPU designers (like myself) are referring to when they say "integer."

                  Do some research and then maybe you can find valid criticisms. I spent a hell of a lot of years studying and implementing this stuff when I got my (multiple!) degrees in this area.
                  There is no way that you have a single degree in any field related to computers much less multiple degrees and there is no way that you have anything to do with designing any CPU.

                  How do I know?

                  Because you strung together a bunch of gibberish buzzwords, that you clearly do not understand and best of all you contradicted yourself a number times, including in one sentence.

                  I can tell there's a few people on this forum that do have a technical background and have written code, but I don't think you could write BASIC much less anything of substance.

                  There's no way that you have designed anything, not even lunch.

                  I do need a good laugh however so feel free to tell me what your multiple degrees are in.

                  Comment


                  • #19
                    Originally posted by Developer12 View Post

                    Are you joking? You're basing all of this on a wikipedia article on computer components from 3 decades ago? That minor blurb about modern hardware in an article focused on hardware from the 80's and 90's could be called a gross aproximation/simplification *at best.*

                    You even left on the highlight option in the URL from when you hurriedly googled that link.
                    I am basing it on the fact that as far back as 20+ years ago colleges were teaching students that FPU no longer existed, the fp registers had been incorporated into the SIMD unit:



                    Intel's current main floating point unit is called SSE. This includes:
                    • A brand new set of registers, xmm0 through xmm15. xmm0 is used to return values from functions, and as the first function argument. They're *all* scratch registers; a perfectly anarchist setup. Save things to memory before calling any functions, because everything can get trashed!
                    • A brand new set of instructions, like "movss" and "addss".


                    This is not disputable, it is a fact that on Intel CPU's the fp registers are integrated into the SIMD unit and if you ever wrote any assembler you would know this.​

                    Comment


                    • #20
                      SPEC CPU2017 Run and Reporting Rules

                      2.2.3: Feedback directed optimization is allowed in peak.


                      2.3.6. Feedback directed optimization must not be used in base.

                      If you tune the optimizations for the Benchmark then you should expect that there would be an improvement for the benchmark.


                      1.4. A SPEC CPU 2017 Result is a Claim About Maturity of Performance Methods

                      "SPEC is aware of the importance of optimizations in producing the best performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks, versus optimizations that exclusively target the SPEC benchmarks. However, with the list above, SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.".

                      It seems like it was more a case of a small portion of the benchmark not reflecting any (or a tiny fraction of) real world code, and optimizing for a better result of those sections, than a case of any cheating. If all of the benchmark was chosen to reflect on various projects then any optimizations wouldn't be benchmark specific. Put another way, the idea of: Why work to fix the compiler for all code when we only need to compile the benchmarks better, isn't a good direction.

                      A better (more real-world) benchmark would be bootstrapping a distribution from source, you would do a profiled build of everything and time how long each part took (and the total time). The profiling would ensure that the fastest version was built and then the total time that took to compile would show the fastest compiler (and the most applicable optimization switches); that could then go on to be used to build SPEC, with each compiler using fairly chosen optimizations.
                      ​​​​​​​

                      Comment

                      Working...
                      X