Announcement

Collapse
No announcement yet.

Intel Xe2 Brings Native 64-bit Integer Arithmetic

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by coder View Post
    You're too cynical. They're not as parsimonious as you suggest.
    Every GPU maker is trying to find the right balance between features + performance vs. cost. As you started out saying, there's no free lunch. Features and performance come at the expense of die area, and that costs money. If a GPU product line is not profitable, then kiss it goodbye. Intel's dGPU effort seems to be teetering on the brink of survival, depending on whose rumors you follow.
    i think intel lost 6,8 billion dollars on the ARC gpus dGPUs already.

    but this also has something positive for intel their iGPUs did improve a lot and this intel 155 notebook chip ( Intel Core Ultra 7 155H) really has some wine in performance per watt...

    it also looks like intel passed the deathpoint in the driver development means their next gen ARC chip will have a better release start.

    their dGPUs are on the brink of survival but what i do not unterstand why do they need to many variants ?

    my opinion is as long as they are not competitive they should better only have one single chip like the intel arc a770. or just start with only intel arc a380... this would had helped them a lot in cut costs and focus driver development on this single chip.

    Phantom circuit Sequence Reducer Dyslexia

    Comment


    • #22
      Originally posted by coder View Post
      FP64 uses way more die area to implement. The size of a FP multiplier is proportional to the square of the mantissa. Fp32 has a 24-bit mantissa (if you count the implicit 1), whereas fp64 has a 53-bit mantissa. That's nearly a 4.9x ratio in silicon area, and you probably expect them just to make bigger GPUs but eat the cost difference and sell them at the same price?
      After repeatedly getting beat by Nvidia in both client and server GPUs, AMD seems to have concluded they could no longer afford to make the compromises necessary to target the same silicon at both markets. That's why they split the big HPC/server GPUs into CDNA and the client/gaming GPUs into RDNA, with each architecture more specialized towards the needs of each market.
      i talked this split between CDNA and RDNA with bridgman many years before this split even happened.
      and believe it or not it was more or less my idea. but of course there is a possibility that other people hat same or similar ideal at the same time to i do not expect that anyone give me exclusive credit for this.
      but fact is i talked this with others with bridgman here in the phoronix.com forum.

      and i can say this for sure that your fabricated version hier: "After repeatedly getting beat by Nvidia in both client and server GPUs, AMD seems to have concluded they could no longer afford to make the compromises necessary to target the same silicon at both markets"

      was clearly never the case. its not that AMD can not afford the compromises necessary instead its a technical impossibility to do so at the absolute highend.

      keep in mind a vega64 has 4096 shader units and the first CDNA after the split had 7680 shading units​ and this was more or less the technical maximum what the EUV​ mask of the TSMC's N7 FinFET process could handle.

      "The first generation of CDNA was announced on March 5th, 2020, and was featured in the AMD Instinct MI100, launched November 16th, 2020. This is CDNA 1's only produced product, manufactured on TSMC's N7 FinFET process.​"

      in the RDNA side it was the same effect without the CDNA/RDNA split a 6900XT and 7900XTX would have less performance for the gamer market.

      "they could no longer afford to make the compromises necessary to target the same silicon at both markets."

      so your sentence here is nonsense of course they could easily could afford the compromise but then both the gaming gpu and the HPC compute GPU would be slower.

      this means there is no negativity AMD just did make sure HPC people get faster Compute card and gaming people did get faster gaming card.

      also keep in mind this CDNA/RDNA split had bigger effect on CDNA and only small effect on RDNA...

      the reason is because RDNA could only switch out some FP64 units to FP32 units to speed up games

      but the CDNA card could light up its design so much that the shader unit count did nearly double.

      this means RDNA did give you like 20% faster result but the CDNA did give you nearly 100% better result.
      Phantom circuit Sequence Reducer Dyslexia

      Comment


      • #23
        Right... compute operations were generally large enough to fill a great number of 64-element waves, while graphics workloads have increasingly been moving to waves with less than 32 elements.

        As a consequence the GFX9 architecture (with most of the silicon used for ALUs/registers and only a small amount for control and local cache) was a cleaner fit than RDNA, which had relatively more silicon dedicated to caches and control units leaving less for ALUs and registers.

        The RDNA architecture is more efficient at dealing with short waves, partly because of the ability to launch 32-entry rather than just 64, but even more so because of the ability to execute a 32-entry wave in a single clock cycle. That makes "C depends on B which depends on A" kind of processing much more performant but does nothing for compute.
        Test signature

        Comment


        • #24
          Originally posted by bridgman View Post
          Right... compute operations were generally large enough to fill a great number of 64-element waves, while graphics workloads have increasingly been moving to waves with less than 32 elements.
          As a consequence the GFX9 architecture (with most of the silicon used for ALUs/registers and only a small amount for control and local cache) was a cleaner fit than RDNA, which had relatively more silicon dedicated to caches and control units leaving less for ALUs and registers.
          The RDNA architecture is more efficient at dealing with short waves, partly because of the ability to launch 32-entry rather than just 64, but even more so because of the ability to execute a 32-entry wave in a single clock cycle. That makes "C depends on B which depends on A" kind of processing much more performant but does nothing for compute.
          thank you for your explanation. do you have any numbers how much performance did give RDNA over VEGA20/radeon 7 on the 7nm node?
          i reallly do not unterstand why people have this conspirancy theory that there was a evil intention for the split between RDNA and CDNA...
          with 2nm or 1,6nm TSMC want to develop EUV Megastructure MASK then the size of a single chip can become much more bigger.
          but in the past the EUV nm node Mask was always the factor what did limit the maximum size of a chip.

          if you have to stay inside this Mask size then the split results in 20% higher performance in RDNA and 100% higher performance in CDNA...

          also to the people who claim CDNA hardware is so expensive no one can buy it this is long history and older CDNA hardware can be bought to similar prices than a AMD PRO w7900...

          ✔ Preisvergleich für AMD Radeon Instinct MI100 ✔ Produktinfo ⇒ GPU: AMD Radeon Instinct MI100 • Speicher: 32GB HBM2 mit ECC-Modus, 4096bit, 2.4Gbps, 1200MHz, 1229GB/s •… ✔ HPC-Prozessoren ✔ Testberichte ✔ Günstig kaufen


          a MI100 for example is only 3024€



          the AMD pro w7900 is 3416€

          this means the mi100 is not more expensive than a w7900...

          the people who really need FP64 performance they can buy this MI100 card.

          and for the people who say a 7900XTX is only 944€ well then you only have 24GB vram so in the end you get what you pay for.

          if people go to ebay they can buy a used mi 100 much cheaper than the 3000€
          Phantom circuit Sequence Reducer Dyslexia

          Comment


          • #25
            Originally posted by qarium View Post
            thank you for your explanation. do you have any numbers how much performance did give RDNA over VEGA20/radeon 7 on the 7nm node?
            Eyeballing the initial 5700XT reviews suggests around a 50% boost per CU - if you discount performance from Vega's HBM the 5700XT comes in around or a bit above Vega20 performance with 1/3 fewer CUs - 2560 vs 3840. Boost clock was a bit higher on the 5700XT so you could subtract ~9% for the clock difference, but I think we made that back and more over time as driver and game optimizations for gfx10 started to catch up with gfx9. I suspect we are well over 50% today.
            Test signature

            Comment


            • #26
              Originally posted by bridgman View Post
              Eyeballing the initial 5700XT reviews suggests around a 50% boost per CU - if you discount performance from Vega's HBM the 5700XT comes in around or a bit above Vega20 performance with 1/3 fewer CUs - 2560 vs 3840. Boost clock was a bit higher on the 5700XT so you could subtract ~9% for the clock difference, but I think we made that back and more over time as driver and game optimizations for gfx10 started to catch up with gfx9. I suspect we are well over 50% today.
              thats really impressive as a result of the split between RDNA and CDNA... (and higher than the 20% i thought it was)
              i really don't get it why people spread conspiracy theories that AMD did it only for market segmentation and to cripple the compute/FP64 performance on RDNA cards.
              but if AMD is not competitive to Nvidia they do not buy AMD hardware at all.

              just imagine if AMD would not have done the split the AMD gpus would not be competitive with Nvidia at all.

              it happened again with radeon 6000 and radeon 7000 people disliked high idle with the chiplet design (fixed later by driver update) and they claimed they would had prevered monolitic designs some even buy the readeon 7600XT instead because it is monolitic chip. but again if they had to pay 300-400€ more for a monolitic design they of course would not buy it.

              i think many people do not know how high the pressure is for innovation in the chip industry

              intel already did lost 6.8billion dollars by entering the GPU market and people still not happy with the status of the driver of the intel hardware.
              Phantom circuit Sequence Reducer Dyslexia

              Comment

              Working...
              X