Originally posted by AlB80
View Post
Announcement
Collapse
No announcement yet.
AMD Radeon "Aldebaran" GPU Support Published For Next-Gen CDNA
Collapse
X
-
- Likes 2
-
Originally posted by pal666 View Postdata providing is not a computation, it's memory footprint(i.e. you need same amount of memory to keep 1/x of x times wider values). to make something useful with this data you have to do some computation. computation often has non-linear complexity with respect to bit width. and often one doublewidth operation requires different circuit than two singlewidth concatenated
Comment
-
-
Originally posted by AlB80 View PostSIMD width is 512bit. It can provide data for 16xFP32 or for 8xFP64. Thus fully equipped SIMD always have 1/2 rate for FP64 and 2x for FP16. That's way it works.
GFX9 and CDNA SIMDs are physically 512-bit wide and process 64 FP32 operations in a 4 clock cycle, or logically 2048-bit wide at 1/4 speed.
We do have packed instructions that allow 2xFP16 operations per lane rather than 1xFP32 operation per lane, but it's just a different instruction and we still execute 64 instances of that instruction in a 4-clock cycle. FP64 instructions are still processed 64-wide, they just take (a lot) longer than 4 cycles.
RDNA SIMDs are 1024-bit wide and process 32 FP32 operations in a single clock.Last edited by bridgman; 26 February 2021, 07:58 PM.Test signature
- Likes 2
Comment
-
Originally posted by Qaridarium View Post
the chiplet design of the second version of CDNA will lover the cost of a AMD Radeon Pro VII,
from 2045€ to maybe 1700€ https://geizhals.de/amd-radeon-pro-v...loc=at&hloc=de
the AMD cpus 3950X and 5950X did same. if you watch what nativ monolit 16core cpus cost it was 300€ more (minimum)
the chiplet design also helps to build low-end version because then they can only put 1 chiplet on the gpu instead of 2-4.
so if you only buy the low-end version with only 1 chip die and not 2-4 the card costs maybe 700-1000€
this means you can be sure AMD is working hard to keep the costs down.
see my post above REAL market difference between 6900XT and 3090 is 1100€ right now.
Comment
-
Originally posted by pal666 View Postmemory provides data, gpu performs calculations on it. only very simple calculations work the way you assume.
It's not an issue to put tens of thousands of ALUs and provide one instruction per clock for 64-thread waves. But it's a big problem to feed everything with data. Thus GPU architectures are data driven. Optimized uArch (SIMD selects one ready wave from 8-10), optimized ISA (special scalar commands to mark and stall unready waves), optimized memory hierarchy (megabytes of vector registers, LDSs and caches) and optimized code.
Each CU can generates thousands of memory requests in a few clocks. MCU processes it and packs the data back into a 2048-bit wave. Then pal666 describes process of data gathering by typing "memory provides data". Ok.
Comment
-
Originally posted by vegabook View PostMy sense though is that a nice little "prosumer" card sitting somewhere around 1000 dollars (maybe even a bit more), with decent if not groundbreaking FP64, would be a nice little earner and certainly cred-booster for AMD.
Should you be interested, you can still buy one as a Radeon Pro VII, for a street price of $1900 (if you can find it in stock). For the extra $$$, you get PCIe 4.0 (instead of 3.0) and full 2:1 32-bit to 64-bit ratio. The card is still limited to 60 CUs, however, as I guess Apple is consuming too many of the chips able to use all 64 CUs.
Unfortunately, it lacks the newer iteration of Rapid Packed Math primitives that even RDNA cards have, and the Matrix Cores + BFloat16 support (IMO, only good for AI) that the CDNA cards now pack. But it will do dual-duty as a decent 4k gaming card and a strong fp64 compute card -- something neither RDNA nor CDMA can claim.
There is a brisk market for Radeon VII on ebay, I think mainly driven by professional graphics artists wanting them to accelerate production rendering in some of their man apps. Last I checked, new ones were going for about $1500 - over 2x their original selling price. Towards the end of 2019, I even saw some on Newegg for < $600!
Originally posted by vegabook View PostThis seems to be a gap in the market that Nvidia has neglected.
Comment
-
Originally posted by AlB80 View PostData providing is the main goal for GPU developers. It's a big issue. Everything else is local and secondary.
So, it's not as if fp64 support is only (or even primarily) constrained by register size. And if there's enough demand to build the extra multipliers, then there's certainly enough to justify widening registers, without necessarily worrying about trying to pack in more fp32, as well.
- Likes 1
Comment
-
-
Originally posted by bridgman View PostThat may be true for CPUs but it's not how our HW operates. CPU SIMDs execute out of a single instruction stream where variable-width instructions can be accomodated, but GPU SIMDs are actually executing multiple threads in lock step so processing 16 data elements some times and 8 data elements another time is not an option.
btw. Low tier GCNs have 1/16 rate FP64 = single FP64 ALU per SIMD or 64-clock cycle.
GFX9 and CDNA SIMDs are physically 512-bit wide and process 64 FP32 operations in a 4 clock cycle, or logically 2048-bit wide at 1/4 speed.
We do have packed instructions that allow 2xFP16 operations per lane rather than 1xFP32 operation per lane, but it's just a different instruction and we still execute 64 instances of that instruction in a 4-clock cycle. FP64 instructions are still processed 64-wide, they just take (a lot) longer than 4 cycles.
RDNA SIMDs are 1024-bit wide and process 32 FP32 operations in a single clock.
Comment
Comment