Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

coder replied

20 February 2021, 04:55 AM
Originally posted by ms178 View Post

I wonder how such an updated GFX9 card with 64 CUs with 8GB HBM2e would perform on 7nm TSMC process. Such a Vega 64 v3 could still be a decent gaming and prosumer card.

We know the answer: often worse than Radeon VII .

What you describe is a fully-enabled Vega 20 with 2 stacks of 1200 MHz HBM2e memory, instead of its 4 stacks of 1000 MHz HBM2 memory. If your goal is to have a 4k card, then you want that extra bandwidth and capacity. If you're running it at 1440p, then maybe you can do well enough with 600 MB/sec of BW, but the extra 4 CUs are just going to make it more likely that you end up BW-constrained.
Leave a comment:
zboszor replied

20 February 2021, 03:00 AM
Originally posted by Laughing1 View Post

Just tget opencl 3.0 in clover (mesa).
bridgman

To support this hardware in Mesa OpenCL, this new stuff needs to be exposed in the libclc bitcode files in LLVM which, as far as I have seen the changes don't touch.
Heck, even my new Renoir laptop is unsupported in libclc+mesa-libOpenCL in Fedora 33, clinfo complained about a missing bc file/symlink.
Notably, gf907, gfx908 and gfx909 support symlinks are missing from libclc. This won't help with Fedora since it uses an outdated libclc, though.
After adding the missing symlinks though (gfx909 in my case), clinfo suddenly recognized my Renoir iGPU.

bridgman mareko Please add the missing gfx generation support into the libclc subproject in llvm officially.

Last edited by zboszor; 20 February 2021, 03:30 AM.
Likes 2
Leave a comment:
Laughing1 replied

20 February 2021, 12:56 AM
Originally posted by Qaridarium View Post

to late because openCL is already death (they are already in work to port the OpenCL working kernel to Vulkan)

soon you will have Vulkan Compute for everything...

Does sycl 2020 work with Vulkan Compute?
Likes 2
Leave a comment:
qarium replied

19 February 2021, 10:54 PM
Originally posted by Laughing1 View Post

Just tget opencl 3.0 in clover (mesa).
bridgman

to late because openCL is already death (they are already in work to port the OpenCL working kernel to Vulkan)

soon you will have Vulkan Compute for everything...
Likes 1
Leave a comment:
Laughing1 replied

19 February 2021, 10:24 PM
Just tget opencl 3.0 in clover (mesa).
bridgman

Last edited by Laughing1; 19 February 2021, 10:26 PM.
Likes 1
Leave a comment:
qarium replied

19 February 2021, 09:42 PM
Originally posted by wizard69 View Post

I'm still dreaming of a Thread Ripper dual socket implementation where the second socket is dedicated to a CDNA chip. Even here I would think that AMD would likely want to push significant power through the chip to keep customers interested. I'm talking 150 to 200 watts which would be a lot on a motherboard with NUMA access to memory.

believe it or not but they work on this right now... but even better than you think.

Zen4 in 5nm + RDNA3 + Xilinx FPGA + HBM3 +Infinity cache +SSD all conected with xGMI

but be sure "dual socket" this will not have a socket at all. soldered directly to the board

also 200 watt?... wrong such a mainboard will be ~600watt and all water cooled.

but no water will be used instead 3M Novec LIQUID is used.
Leave a comment:
wizard69 replied

19 February 2021, 09:12 PM
Originally posted by cb88 View Post

...

Anything you can run on CDNA is going to also run on RNDA... just not quite as fast as long as it is portable code. ...

That statement is true today but likely will not remain so into the future. Being a GPU/Acceleration processor, AMD can add easily just about any sort of specialized instruction and hardware to accelerate advanced math computation. We are basically on the first release of CDNA and it hasn't moved far from its GPU roots. I can see the day when a CDNA card is so far removed from the GPU world that you wouldn't even consider running your code on a mainstream GPU.
Likes 1
Leave a comment:
wizard69 replied

19 February 2021, 09:05 PM
Originally posted by Spacefish View Post

I guess sadly the days are over, where the gaming cards and compute cards share the same architecture..

It really makes no sense to share archetectures when you can optimize a design for a specific use case.

Would be cool if they could release a ~300-500$ 7nm CDNA card for the "casual" academic users or people at home doing some ML / Mining / Compute intensive tasks.

Who knows it might happen. However the market for these cards is very hungry for performance so that is what AMD has to go after first.

Otherwise we are stuck with RDNA / RDNA2 cards with second class citizen ROCm support..
At least they are now commited to bringt RDNA support to ROCm, as i can tell from the github issues tracker.

Just remember AMD is hiring talent right now and has been since they cleaned up their financial mess. It takes awhile to do things right. Being an RDNA card owner I can say that I'm a bit disappointed that ROCm seems to never come. On the other hand I understand what they are trying to do with the resources they have.

RDNA won´t have all the cool compute features (as they are not required for gaming). This means the CDNA hand written assembly kernel will never run on them and probably there is not such much interest on AMDs side to spend a lot of time to optimize kernels for RDNA..

I wish they would just use some of the otherwise defective CDNA chips, disable 3/4th of the cores and release them as some entry level CDNA card.
A small 7nm CDNA chip (with like 20 CUs for example) is probably too expensive to develop for the small market segment.

In order to have enough defective chips to do a cut down CDNA implementation they will need to achieve a certain volume with the mainstream chip. I'm not yet convinced that CDNA is moving in a volume high enough to produce a marketable quantity of lower performance chips.

Maybe even allow all working CUs to work and clock them down or nerf the cards in some other way, so enterprise customers don´t want to buy them.. Like restricting using multiple of these Cards in one system or the like..

The common practice with computational chips like this is to partition off defective areas of the "big" chip to use in lower end cards. The problem here is volume, they need enough silicon to make such cards viable as a product. If the rumors of high yields at TSMC are true that makes it even harder to come up with enough defective chips.

I understand what you want and it night be in AMD's plans long term. It will only happen though when the economic conditions are right.

I'm still dreaming of a Thread Ripper dual socket implementation where the second socket is dedicated to a CDNA chip. Even here I would think that AMD would likely want to push significant power through the chip to keep customers interested. I'm talking 150 to 200 watts which would be a lot on a motherboard with NUMA access to memory.
Likes 1
Leave a comment:
cb88 replied

19 February 2021, 06:08 PM
Originally posted by ms178 View Post

That's exactly along the lines of my thinking, a card you can work with during the daytime and play with at night. And if the compute side becomes more important for general purpose tasks again (the writing is on the wall with CXL incoming), such divergence in architecture could hurt them because it would be more effort to optimize for two architectures instead of one, but maybe we will see them converging again at that point in time.

There is absolutely no point in that, when RDNA can run Vulkan and OpenCL compute, and HIP on Linux.

The sole purpose of CDNA is to go after HPC density and TCO. And unlike the article implies CNDA != GCN any more than RDNA is GCN.

Anything you can run on CDNA is going to also run on RNDA... just not quite as fast as long as it is portable code. Nobody should be writing assembly for GPUs at this point... unless you really need an HPC application to scale, in which case you already have access to that.
Likes 1
Leave a comment:
ms178 replied

19 February 2021, 02:04 PM
Originally posted by Spacefish View Post

I guess sadly the days are over, where the gaming cards and compute cards share the same architecture..

Would be cool if they could release a ~300-500$ 7nm CDNA card for the "casual" academic users or people at home doing some ML / Mining / Compute intensive tasks.

Otherwise we are stuck with RDNA / RDNA2 cards with second class citizen ROCm support..
At least they are now commited to bringt RDNA support to ROCm, as i can tell from the github issues tracker.

That's exactly along the lines of my thinking, a card you can work with during the daytime and play with at night. And if the compute side becomes more important for general purpose tasks again (the writing is on the wall with CXL incoming), such divergence in architecture could hurt them because it would be more effort to optimize for two architectures instead of one, but maybe we will see them converging again at that point in time.
Likes 1
Leave a comment:

Announcement

Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: