Announcement

Collapse
No announcement yet.

Targeted Intel oneAPI DPC++ Compiler Optimization Rules Out 2k+ SPEC CPU Submissions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CmdrShepard
    replied
    Can someone get us some actual details on what exactly the Intel compiler did wrong here?

    Has it:

    1. Detected a common code pattern and applied common optimization for said code pattern?

    OR it:

    2. Detected a specific benchmark code pattern and applied special "only for this benchmark code pattern" optimization?

    Inquiring minds want to know, and apparently press nowadays can't be arsed with such "irrelevant" details when it's much easier to post a sensationalist headline.

    Leave a comment:


  • muncrief
    replied
    I started working in the Silicon Valley in the late 1970s and Intel was rotten then, and is rotten now. They consistently crushed other companies and engineers with bullying tactics and frivolous law suits, instead of fairly competing with superior products.

    So anyone who is surprised by this shouldn't be, as Intel will never change. And the only thing that saved consumers from them is AMD.

    And the only way to defeat Intel, and perhaps one day scrape out the rot, is to never purchase any Intel products again, and build systems exclusively with AMD products.

    Leave a comment:


  • coder
    replied
    Originally posted by Developer12 View Post
    The whole reason intel brought in AVX512 is because they want to market their CPUs (having really no GPUs to speak of yet) as being suitable for AI workloads in some way, even if it's just inference.
    No, the roots of AVX-512 date back to their Xeon Phi product line, which was more about HPC and really only started to embrace deep learning when it was already on its way to the graveyard of discontinued Intel product lines. I'm pretty sure even the decision to add AVX-512 to Skylake-X wasn't primarily motivated by AI.

    AVX-512 has a few ISA-level improvements, besides mere vector width, which make it more desirable even with 128-bit or 256-bit operands than AVX/AVX2.
    • opmask registers
    • separate source(s) and destination operands
    • twice the number of vector registers
    • new scatter and permutation operations

    These are among the reasons Intel is bothering with AVX10/256. If vector width were the only real benefit AVX-512 offered over AVX2, then AVX10/256 would make no sense.
    Last edited by coder; 12 February 2024, 10:05 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by sophisticles View Post


    This is not disputable, it is a fact that on Intel CPU's the fp registers are integrated into the SIMD unit and if you ever wrote any assembler you would know this.​
    You're such a bonehead that you didn't even bother to read your own references! If you check page 20-21, that one even says:

    "x87 is still around!
    Scientifc applications that beneft from 80 bits of FP
    precision sometimes still use it​"

    Not only does it implement support for up to 80-bit precision, but x87 also supports denormals in hardware, which SSE and AVX don't. Enabling denormal support, in SSE, comes at a huge performance cost, if you actually have some operation involving one.

    Originally posted by Developer12 View Post
    Are you joking? You're basing all of this on a wikipedia article on computer components from 3 decades ago?
    sophisticles​ has such a high opinion of himself that he believes his stale knowledge + a couple hasty google searches are superior to most people on this forum. I think he's possibly suffering from severe Narcissistic personality disorder, but perhaps there's something else wrong with him.
    Last edited by coder; 12 February 2024, 10:09 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by s_j_newbury View Post
    Other more generally useful improvements could have potentially been made instead of optimizing design for AVX512. There's no free lunch.
    Yes, but also no. Skylake client cores didn't have AVX-512, so the scope for improvements made in lieu of AVX-512 was somewhat limited.

    Originally posted by s_j_newbury View Post
    There is a generally held view amongst many "programmers" that FP is how maths is done, which is wrong. FP is how maths is approximated.
    For most engineering and scientific applications, you're already dealing with approximate data, anyhow. fp64 is usually good enough for those purposes. I'll bet even a surprising amount of financial software just uses fp64, especially if it's just for modelling purposes and not actual transaction-processing.

    Leave a comment:


  • coder
    replied
    Originally posted by Developer12 View Post
    See tovalds' thoughts on the matter: https://www.realworldtech.com/forum/...rpostid=193190
    I think it's risky to view Linus' comments out-of-context.

    One clue is this point he made, in that post:

    "I want my power limits to be reached with regular integer code, not with some AVX512 power virus that takes away top frequency (because people ended up using it for memcpy!)"


    A significant downside of AVX-512, in Intel 14nm CPUs (Skylake-X & Cascade Lake), was sometimes severe clock-throttling that could occur, even from modest use. It had been shown to negatively affect clockspeeds enough that you were often better off completely avoiding AVX-512, if you weren't using it heavily.

    When you also consider the Balkanization of AVX-512 subsets, it seems clear that Intel jumped the gun on implementing it. Deploying it on 14nm technology just came with too many caveats: power/clockspeed and die space.

    However, Michael's tests of Zen4 and Sapphire Rapids have shown virtually no clock-throttling or excessive power utilization and Zen 4 even managed to implement it in a way that seems quite area-efficient. Not only that, but Zen 4 came right out of the gate with virtually all subsets implemented (except for a couple of the most recent ones).

    That said, Zen 5 is rumored to have gone for native 512-bit execution paths. So, we'll have to see how that pans out, but I think the current generation of AVX-512 is relatively harmless. We can perhaps turn our criticism to other die space hogs, like Intel's AMX, which also bloats thread context by a whopping 8 kB, and so far isn't useful for anything but AI inferencing.
    Last edited by coder; 12 February 2024, 07:50 AM.

    Leave a comment:


  • JustRob
    replied
    SPEC CPU2017 Run and Reporting Rules

    2.2.3: Feedback directed optimization is allowed in peak.


    2.3.6. Feedback directed optimization must not be used in base.

    If you tune the optimizations for the Benchmark then you should expect that there would be an improvement for the benchmark.


    1.4. A SPEC CPU 2017 Result is a Claim About Maturity of Performance Methods

    "SPEC is aware of the importance of optimizations in producing the best performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks, versus optimizations that exclusively target the SPEC benchmarks. However, with the list above, SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.".

    It seems like it was more a case of a small portion of the benchmark not reflecting any (or a tiny fraction of) real world code, and optimizing for a better result of those sections, than a case of any cheating. If all of the benchmark was chosen to reflect on various projects then any optimizations wouldn't be benchmark specific. Put another way, the idea of: Why work to fix the compiler for all code when we only need to compile the benchmarks better, isn't a good direction.

    A better (more real-world) benchmark would be bootstrapping a distribution from source, you would do a profiled build of everything and time how long each part took (and the total time). The profiling would ensure that the fastest version was built and then the total time that took to compile would show the fastest compiler (and the most applicable optimization switches); that could then go on to be used to build SPEC, with each compiler using fairly chosen optimizations.
    ​​​​​​​

    Leave a comment:


  • sophisticles
    replied
    Originally posted by Developer12 View Post

    Are you joking? You're basing all of this on a wikipedia article on computer components from 3 decades ago? That minor blurb about modern hardware in an article focused on hardware from the 80's and 90's could be called a gross aproximation/simplification *at best.*

    You even left on the highlight option in the URL from when you hurriedly googled that link.
    I am basing it on the fact that as far back as 20+ years ago colleges were teaching students that FPU no longer existed, the fp registers had been incorporated into the SIMD unit:



    Intel's current main floating point unit is called SSE. This includes:
    • A brand new set of registers, xmm0 through xmm15. xmm0 is used to return values from functions, and as the first function argument. They're *all* scratch registers; a perfectly anarchist setup. Save things to memory before calling any functions, because everything can get trashed!
    • A brand new set of instructions, like "movss" and "addss".


    This is not disputable, it is a fact that on Intel CPU's the fp registers are integrated into the SIMD unit and if you ever wrote any assembler you would know this.​

    Leave a comment:


  • sophisticles
    replied
    Originally posted by Developer12 View Post
    You have no idea how modern processors implement floating point, and even less idea what CPU designers (like myself) are referring to when they say "integer."

    Do some research and then maybe you can find valid criticisms. I spent a hell of a lot of years studying and implementing this stuff when I got my (multiple!) degrees in this area.
    There is no way that you have a single degree in any field related to computers much less multiple degrees and there is no way that you have anything to do with designing any CPU.

    How do I know?

    Because you strung together a bunch of gibberish buzzwords, that you clearly do not understand and best of all you contradicted yourself a number times, including in one sentence.

    I can tell there's a few people on this forum that do have a technical background and have written code, but I don't think you could write BASIC much less anything of substance.

    There's no way that you have designed anything, not even lunch.

    I do need a good laugh however so feel free to tell me what your multiple degrees are in.

    Leave a comment:


  • Developer12
    replied
    Originally posted by sophisticles View Post

    So you're an electrical engineer?

    May want to take a refresher course:

    https://en.wikipedia.org/wiki/Floati...urrent%20archi tectures%2C%20the,newer%20Intel%20and%20AMD%20proc essors.









    I know for a fact that Intel since the introduction of the Pentium III has combined the FPU with the SSE until and one of the reasons why Intel used to tell developers that floating point operations were deprecated and SIMD should be used whenever possible.

    I could have sworn that AMD had followed suit decades ago but maybe they didn't and maybe that's why many people used to avoid AMD processors for scientific workloads.

    It could also be why there were applications in the late 90's that would only run on Intel processors.​
    Are you joking? You're basing all of this on a wikipedia article on computer components from 3 decades ago? That minor blurb about modern hardware in an article focused on hardware from the 80's and 90's could be called a gross aproximation/simplification *at best.*

    You even left on the highlight option in the URL from when you hurriedly googled that link.
    Last edited by Developer12; 10 February 2024, 04:19 PM.

    Leave a comment:

Working...
X