Announcement

Collapse
No announcement yet.

Intel 5th Gen Xeon "Emerald Rapids" AVX-512 Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by sophisticles View Post
    I had promised myself that I wouldn't engage you in conversation because you are obviously mentally unbalanced, but damn you, your idiocy reeled me in.
    you just don't know me. if you would know me you would know that i am mentally very strong and durable​.
    even then if it does not look to you like that. nothing is as it seems​

    Originally posted by sophisticles View Post
    The Intel Xeon Platinum 8592+ doesn't have any E-cores, which makes everything you just said really stupid:

    Now go throw a tempter tantrum and claim that I am using Intel's website to infect your Fedfora box with a trojan.
    right if the Intel Xeon Platinum 8592+ does not have E-Cores then of course my argument does not fit to it.
    but tell me how many people do you know who buy Intel Xeon Platinum 8592+ ???
    most people in this forum whi buy intel cpus they mostly buy E-Cores.

    also some parts of my arguments is not about E-cores at all.

    future server cpu designs from amd will have a ZEN5+ZEN5c design with the zen5 have full-sized AVX512 implementation(not the small double pump version in 2256bit) and the ZEN5c cores wiill have the small double pump 256bit variant to get smaller core size in transistor counts.

    with AMD you support 1 single ISA and it will fit into all their cores from BIG to Little

    and AMD already won the asymetric cpu scheduler war because detect cache misses and move the threat to a core with more cache is trivial
    and also to move code who use AVX512 heavily to the ZEN5 cores with full size AVX512 implementation will be trivial to.
    Phantom circuit Sequence Reducer Dyslexia

    Comment


    • #12
      Originally posted by sophisticles View Post
      BTW, i was not wrong, in fact you eventually conceded that my theory may have merit.
      First, that's a mischaracterization of what I wrote.

      Second, you're an idiot, if you think some bolt-on Python JIT module proves anything about GCC or Clang. All you proved was that you don't know how to read the docs, or even do a couple web searches.

      Originally posted by sophisticles View Post
      Now if you want to show me how much of a bad ass programmer you are, maybe you can crack an egg of knowledge on this noob's head and show me how an "expert" would go about reconstructing the code so that I can analyze FLT for values of A,B,C and N up to 1000.
      I wouldn't dare deprive you of the joy of figuring it out for yourself. You seem to have plenty of time on your hands and it should be no problem for you, if you're such a smart guy.

      Comment


      • #13
        Originally posted by sophisticles View Post
        Impressive.

        With AVX-512 these Xeons were in some cases 10x faster than without and were able to use less power and stay cooler.

        Just amazing the improvement Intel has made to AVX-512 since it was first introduced.

        If course when you have all the money in the world to hire the best engineers I guess this is to be expected.
        Hello everybody,

        I would like to apologize for my posts under nickname "sophisticles" and "hel88".

        the thing is, I am very sick person. Schizophrenia with bipolar disorder.
        When I'm on my medication like now, I feel ashamed for the things that I do when not on medication.

        For example, when I'm not using my therapy properly I get this crazy tendency to troll on linux forums. For that devious purpose I am using nicknames "sophisticles" and "hel88". under those nicknames I write crazy, insane things. when I am on regular therapy like now, I cannot believe the crap that I wrote under those 2 nicknames.

        overall, I would like all of you to know that I don't really mean what I write under those 2 nicknames and also, I love linux, open source and gpl. and yes, microsoft sucks.​

        Comment


        • #14
          Originally posted by SophTherapy View Post
          Hello everybody,
          I would like to apologize for my posts under nickname "sophisticles" and "hel88".
          the thing is, I am very sick person. Schizophrenia with bipolar disorder.
          When I'm on my medication like now, I feel ashamed for the things that I do when not on medication.
          For example, when I'm not using my therapy properly I get this crazy tendency to troll on linux forums. For that devious purpose I am using nicknames "sophisticles" and "hel88". under those nicknames I write crazy, insane things. when I am on regular therapy like now, I cannot believe the crap that I wrote under those 2 nicknames.
          overall, I would like all of you to know that I don't really mean what I write under those 2 nicknames and also, I love linux, open source and gpl. and yes, microsoft sucks.​
          right.... sophisticles smears people who do in fact have such a mental disorder like Schizophrenia but like everyone can see he use is complete energy to damage the reputation of the linux/open-source community and people who come in contact with him want to quit the forum for this destructive force.
          Phantom circuit Sequence Reducer Dyslexia

          Comment


          • #15
            Originally posted by qarium View Post

            right.... sophisticles smears people who do in fact have such a mental disorder like Schizophrenia but like everyone can see he use is complete energy to damage the reputation of the linux/open-source community and people who come in contact with him want to quit the forum for this destructive force.
            and michael is not doing anything about it

            Comment


            • #16
              Originally posted by coder View Post
              First, that's a mischaracterization of what I wrote.

              Second, you're an idiot, if you think some bolt-on Python JIT module proves anything about GCC or Clang. All you proved was that you don't know how to read the docs, or even do a couple web searches.

              I wouldn't dare deprive you of the joy of figuring it out for yourself. You seem to have plenty of time on your hands and it should be no problem for you, if you're such a smart guy.
              It is not a mischaracterization, first you scoffed at my theory, then you offered the exact same one.

              I used Numba as an example of a compiler that make produce faster binaries but does it at the expense of erroneous results and the need to verify that the binaries created by each are the same.

              Problem description As reported in #1150, compilation with default optimization options for the Intel compiler suite icx/icpx results in incorrect results. The behavior can be reproduced for gcc wi...


              For input 0xffffffff, the following c code works fine with no optimization, but produces wrong results when compiled with -O1. Other compilation options are -g -m32 -Wall. The code is tested with c...


              clang version: 15.0.4 aarch64 #include unsigned long xL0; // 8 byte unsigned long xL1; int main() { xL0 = 0x2; xL1 = 0x2; xL0 >>= (xL1 + 0xffffffff); printf("%lx\n",xL0); // 1 } result clang -O0: 1 -O1: 491e88 and with different verion of clang, the result of -O1 will be different gcc -O0, -O1: 1




              As you can see there are numerous cases if you followed your own advice and searched online of one compiler or another producing the incorrect results.

              Which brings us full circle to this article and saying that Clang produces faster binaries tells us nothing about the quality of the binaries.

              In fact, I intend to test the quality post my results here.

              Comment


              • #17
                Originally posted by sophisticles View Post
                It is not a mischaracterization,
                Yes, it is, because you're confusing discussion of GCC/Clang vs. a bolt-on Python module.

                Originally posted by sophisticles View Post
                first you scoffed at my theory,
                About GCC/Clang doing optimization, merely at -O3, which affected numerical accuracy, yes. I did scoff at that and I still do. As I explained, optimizations which change the results are hidden behind other options than the ones he used.

                Originally posted by sophisticles View Post
                then you offered the exact same one.
                About the Numba? I never said it wasn't making substitutions. The docs say it is. If you bothered to read them or even do a web search to understand what was happening, you would've known that.

                Originally posted by sophisticles View Post
                I used Numba as an example of a compiler
                It's not the same thing. This is the crux of the issue.

                Originally posted by sophisticles View Post
                I took the time to explain exactly this point, but I see it fell on deaf ears because you're only interested in arguing and not understanding.

                As the answers explain, this & your other links are depending on undefined behavior. Once you veer into UB, optimizers will take liberties with your code that can affect the results. It's a GIGO situation (garbage in, garbage out). The solution is to fix your code, not blame the tool.

                Originally posted by sophisticles View Post
                As you can see there are numerous cases if you followed your own advice and searched online of one compiler or another producing the incorrect results.
                No, you didn't follow my advice. My advice was to read the docs and use tools correctly. Writing code with undefined behavior is a misuse of the tool. Your dependence on infinite integers also boiled down to a misuse of the acceleration frameworks you tried to use, because they don't support that functionality as explained in their docs.

                Using these tools without knowing what you're doing can hurt you.

                Originally posted by sophisticles View Post
                Which brings us full circle to this article and saying that Clang produces faster binaries tells us nothing about the quality of the binaries.
                Nope, but that's a different thread.

                Comment


                • #18
                  I believe AMD only added AVX512 in Genoa, yet they were previously able to gain market share in the segment of the market that apparently didn't need AVX512... I assume because they could squeeze more small and low power cores in a package.

                  In 2024 Intel splits the server market, with Granite Rapids targeting the high performance half and Sierra Forest targeting the low power, high core count half. Both these will use the same platform, with MCR DIMM support and full cxl 2.0 spec IO, at least according to their PR. One major advantage is that AMD has no answer to the per core AMX acceleration in Granite Rapids.

                  Ironically, AMD may not be able to continue their single architecture decision if they are forced by AI performance lag to add an AMX equivalent in their high performance cores.

                  Intel will also be increasing the number of AVX512 acceleration operations, beginning in 2025 with their AVX10 program, so AMD will be faced with the similar dilemma of either expanding avx512 operations on both hp and efficient cores or following Intel's design of supporting only avx2 on the coming avx10 e-cores.

                  Comment


                  • #19
                    Originally posted by jayN View Post
                    One major advantage is that AMD has no answer to the per core AMX acceleration in Granite Rapids.

                    Ironically, AMD may not be able to continue their single architecture decision if they are forced by AI performance lag to add an AMX equivalent in their high performance cores.
                    Perhaps the MI300X sort of tips their hand at the direction they're headed.

                    Originally posted by jayN View Post
                    Intel will also be increasing the number of AVX512 acceleration operations, beginning in 2025 with their AVX10 program, so AMD will be faced with the similar dilemma of either expanding avx512 operations on both hp and efficient cores or following Intel's design of supporting only avx2 on the coming avx10 e-cores.
                    I think 512-bit becomes less of an issue at smaller process nodes. Plus, AMD seems to have found a way to implement it that's compact enough.

                    On the server platforms, if they'd just double the number of their FMA-capable ports to match Golden Cove, I'll bet they could nearly zero-out the per-core AVX-512 advantage Intel currently enjoys. They could do this in just their non-C cores, if it markedly impacted area efficiency.

                    Comment

                    Working...
                    X