Announcement

Collapse
No announcement yet.

More Intel AVX10.2 Enablement Lands In The GCC 15 Compiler

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • More Intel AVX10.2 Enablement Lands In The GCC 15 Compiler

    Phoronix: More Intel AVX10.2 Enablement Lands In The GCC 15 Compiler

    Earlier this month Intel compiler engineers began adding AVX10.2 support into the GCC 15 open-source compiler. Now as we approach the end of August, another big batch of AVX10.2 enablement has landed for this next GNU Compiler Collection release...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    What a positive spin on what amounts to intel pissing on everyone's chips because they engineered and planned poorly.

    Comment


    • #3
      The new old features.

      Comment


      • #4
        I have to admit, in the past i got much objections to AVX. It cost too much power and produced too much heat. But the recent implementations in AMD's zen 5 chips convinced me. I want to use it in my packages and port as much code as possible to it. But AVX seems horribly fragmented thanks to intel. It would be nice from intel, if they finally stopped this silly cat-and-mouse-game with AMD and established a solid standard for real this time. Changing the previous names from AVX512 to AVX10+ is hopefully a first step for a clean up and proper consolidation.

        Comment


        • #5
          Originally posted by M.Bahr View Post
          I have to admit, in the past i got much objections to AVX. It cost too much power and produced too much heat. But the recent implementations in AMD's zen 5 chips convinced me. I want to use it in my packages and port as much code as possible to it. But AVX seems horribly fragmented thanks to intel. It would be nice from intel, if they finally stopped this silly cat-and-mouse-game with AMD and established a solid standard for real this time. Changing the previous names from AVX512 to AVX10+ is hopefully a first step for a clean up and proper consolidation.
          The fragmentation problem basically solved itself, the common targets are generic (getting phased out), sse4.2 and lower (99.9% of hardware that might use a binary made at this point can do this, it's the new generic target), avx2 (majority of x86 hardware can do this), avx512. AVX512 has a mess of optional stuff but enough time has passed that the ones actually implemented are known and that is what most people target. AVX10 is actually introducing more fragmentation, it's not a consolidation attempt it's the opposite. In addition to the existing targets above, which you must have to make best use of existing hardware which don't have avx10 support, you need to support avx10, which itself has optional components (another awful decision for a spec), probably you'd need to support avx10.256 and avx10.512. AVX10.512 is probably as simple as compile this avx512 code but as avx10 because reasons, avx10.256 is another full target.

          The way they should have consolidated is simple. Officially deprecate the avx512 ops they've abandoned, promote the optional avx512 ops they support to mandated, commit to never getting rid of mandated ops, come to a consensus with AMD over what is mandated, implement avx512 on weak cores the same way AMD did (do 512 bit ops by breaking them up into smaller width ops done in sequence). Hideously simple, no new wheels to be invented or anything.

          Comment


          • #6
            Originally posted by geerge View Post

            The fragmentation problem basically solved itself, the common targets are generic (getting phased out), sse4.2 and lower (99.9% of hardware that might use a binary made at this point can do this, it's the new generic target), avx2 (majority of x86 hardware can do this), avx512. AVX512 has a mess of optional stuff but enough time has passed that the ones actually implemented are known and that is what most people target. AVX10 is actually introducing more fragmentation, it's not a consolidation attempt it's the opposite. In addition to the existing targets above, which you must have to make best use of existing hardware which don't have avx10 support, you need to support avx10, which itself has optional components (another awful decision for a spec), probably you'd need to support avx10.256 and avx10.512. AVX10.512 is probably as simple as compile this avx512 code but as avx10 because reasons, avx10.256 is another full target.

            The way they should have consolidated is simple. Officially deprecate the avx512 ops they've abandoned, promote the optional avx512 ops they support to mandated, commit to never getting rid of mandated ops, come to a consensus with AMD over what is mandated, implement avx512 on weak cores the same way AMD did (do 512 bit ops by breaking them up into smaller width ops done in sequence). Hideously simple, no new wheels to be invented or anything.
            In case you missed it, this is why i wrote:
            "It would be nice from intel, if they finally stopped this silly cat-and-mouse-game with AMD and established a solid standard for real this time." ...

            As for the consolidation i wrote:
            "Changing the previous names from AVX512 to AVX10+ is hopefully a first step for a clean up and proper consolidation." ...
            And by this i literally meant only the name scheming, which is a hint to the chaos behind the scenes by the way. The question remains: When does the proper technical consolidation follow? And here i stick to my aforementioned comment. It depends on the cooperation between intel and AMD.

            Now that I think about this again, maybe we will never have a really harmonic and reliable standard in the x86 world. There has always been tension between the two, with one trying to outpace the other through exclusively patented extensions to the detriment of consumers. I remember the aweful time when intel introduced MMX and AMD had to parry with 3DNow. Shortly after MMX intel abandoned it and introduced SSE and several versions after, similar to what we witness now with AVX. It was only after their cross-licensing agreement that x86 users got some peace in mind. Before that there was no real guarantee that a program that run on intel CPUs could also run flawlessly on AMD ones and vice versa.

            I think the patent trolling competition in the x86 world is a fundamentally intrinsic issue. Its cat-and-mouse-game seems to start all over again here and then. Both intel and AMD need a central and independent authority, which consolidates and defines the standard similar to what the Vulkan Chronos group does for the GPU sector or what RISC-V international does for RISC-V chips. Even a license model like Arm's would be better than the x86 duopoly. But according to the news Arm seems to drive away from their original license model. They plan to produce own chips in future. This resulted in conflicts with qualcomm and others.
            Last edited by M.Bahr; 28 August 2024, 09:30 AM. Reason: some typos

            Comment


            • #7
              Originally posted by user556 View Post
              The new old features.
              AVX10.1 was basically nothing more than renaming of AVX-512 and slight instruction encoding changes. AVX10.2 truly adds new functionality, which the article enumerated.

              Originally posted by M.Bahr View Post
              Changing the previous names from AVX512 to AVX10+ is hopefully a first step for a clean up and proper consolidation.
              It was more than a name change. They also tweaked the instruction encoding just enough to break binary compatibility with AVX-512 code, thus screwing AMD.​ They didn't have to do that, just to create the new versioning scheme and introduce the 256/512 schism, but Intel being Intel...
              Last edited by coder; 01 September 2024, 09:00 PM.

              Comment


              • #8
                Originally posted by geerge View Post
                AVX512 has a mess of optional stuff but enough time has passed that the ones actually implemented are known and that is what most people target.
                I call BS on this. x86-64-v4 includes just what was in Skylake X, and that's what I'm sure most people are still targeting. Between that and Zen 4, the following extensions were added: VPOPCNTDQ, IFMA, VBMI, VNNI, BF16, VBMI2, BITALG, VPCLMULQDQ, GFNI, VAES. Ice Lake & Rocket Lake have all of those, besides BF16. Tiger Lake still lacks BF16, but added VP2INTERSECT. Sapphire Rapids lacks VP2INTERSECT, but has BF16 and FP16.

                So, it was still a mess, even if we ignore the old Xeon Phi stuff. I guess the natural target for x86-64-v5 would have been Ice Lake, although too bad it lacked BF16 (good for AI training).

                Originally posted by geerge View Post
                AVX10 is actually introducing more fragmentation, it's not a consolidation attempt it's the opposite.
                It's too simplistic to say they're only fragmenting. What they've promised is to avoid a situation like Tiger Lake vs. Sapphire Rapids, where one has VP2INTERSECT but the other has FP16. The versioning scheme takes us back to the era when newer CPUs would be a complete superset of the ones which came before.

                The only way it increases fragmentation is by introducing the 256-bit vs. 512-bit schism. In actual fact, Intel refers to AVX10/512 as "legacy" and I therefore expect them to eventually disadvantage it, perhaps similar to what Zen 4 did, so that it doesn't offer a major advantage over AVX10/256, even on their CPUs that support it. This will push people to write mostly AVX10/256 code and nullify the advantage Zen 5 introduced of having full, 512-bit wide pipelines. Instead, Intel will use a larger number of 256-bit pipelines (they already increased this from 3 to 4, in Lions Cove).

                Originally posted by geerge View Post
                you'd need to support avx10.256 and avx10.512. AVX10.512 is probably as simple as compile this avx512 code but as avx10 because reasons, avx10.256 is another full target.
                The nomenclature they use is AVX10.x/w, where x is the version number and w is the width. So, x defines which instructions are supported and w defines the maximum operand size.

                Originally posted by geerge View Post
                The way they should have consolidated is simple. Officially deprecate the avx512 ops they've abandoned, promote the optional avx512 ops they support to mandated, commit to never getting rid of mandated ops,
                Agreed, so far. This is basically AVX10, but without the instruction encoding change.

                Originally posted by geerge View Post
                implement avx512 on weak cores the same way AMD did (do 512 bit ops by breaking them up into smaller width ops done in sequence).
                They didn't have to do this part. They still could have a limit on max operand width. AVX-512 supports 3 different operand widths: 128, 256, and 512. They could've added a CPUID flag specifying the max width supported by an implementation, and if you just limited yourself to 256-bit operands, you wouldn't even need to check it.

                I'm pretty sure the reason they didn't follow the Zen 4 approach is probably because Gracemont followed the Zen 1 approach of implementing 256-bit AVX instructions by breaking them into two 128-bit operations. Perhaps they felt it was too complicated to extend this to 512-bit operations and would occupy too much die area to use Zen 4's approach. Keep in mind that Intel made the E-cores on a "7 nm"-class process node, while Zen 4 used "5 nm"-class, and also that Gracemont cores are about 28% as large as the Golden Cove P-cores. I think they're about 1.2 mm^2, whereas a Zen 3 core was 3.1 mm^2. Zen 4 is about 3.6 mm^2 and made on a smaller node.

                Originally posted by geerge View Post
                Hideously simple, no new wheels to be invented or anything.
                If you make the E-cores too big, it breaks Intel's hybrid strategy. They dropped AVX-512 from client processors because client workloads had never really adopted it and didn't stand to gain as much from it as server. Die area costs money. So, for Intel to burn die area on something AMD didn't support (at the time Alder Lake launched) and that most client software didn't use would've been foolish.

                I think they probably made the right call, in spite of how unpopular it's been with enthusiasts. Where I fault Intel is in doing AVX10 in a way that breaks binary compatibility with AVX-512. There wasn't a good technical argument for that. It seems mostly done to screw AMD.

                Comment

                Working...
                X