Originally posted by oleid
View Post
Announcement
Collapse
No announcement yet.
Radeon ROCm 4.1 Released - Still Without RDNA GPU Support
Collapse
X
-
I got OpenCL to work for the first time on my Navi10 5700 today! Yay ... ok blender renders the default cube in ~43s on the GPU and <10s on my 3700X.
But BMW: CPU: 3m25, GPU 1m22. So increase Tilesize -> default cube <7s on the GPU. Nice. It looks like the difference is more noticable for bigger scenes.
I think one of the biggest problem for AMD/Rocm/OpenCL is the installation and its image.
Comment
-
Originally posted by Spacefish View PostIO Die
The IO Die was probably one of the first chips developed for the platform surrounding the AM4 socket / Zen in general.. AMD was linked up with Global Foundries back then and did not have a large R&D budget, as they had very uncompetitive CPU offerings.
It can be used as a PCH (x570 chipset) and or as the IO Die to link the chiplets / offer all the integrated USB/Ethernet,SATA PCIe connectivity.
This chips is developed for GF 12nm and still produced there. And probably the design did not change much during Zen -> Zen 3, as there is no reason for that / R&D is expensive for little gains.Last edited by smitty3268; 24 March 2021, 04:46 PM.
Comment
-
Originally posted by bridgman View Post
Got it - thanks.
Just curious, which packaged driver are you using ? I'm asking because 20.45 and up use ROCm paths for OpenCL even on the packaged drivers.
looks to be builkding the 4.1 branch.
On testing my VII is not playing with rocm-opencl. Even rocminfo is not showing it!
Comment
-
Originally posted by smitty3268 View Post
AMD still has contracts with Global Foundries dating back to their split. They have to purchase a bunch of silicon from Global Foundries each year, or else they end up paying a large financial penalty - essentially paying GF for nothing, in that case. So it's not as simple as just redoing the I/O die on a better TSMC node, which would be beneficial in terms of power usage - they have to send a bunch of money GF's way one way or another.
Comment
-
Originally posted by bridgman View Post
There is a problem specific to Vega20 consumer cards in the 4.1 release that is interfering with running over the upstream kernel. Digging into it now.
Code:[FONT=monospace][COLOR=#000000]rocminfo [/COLOR] [COLOR=#b2b2b2]ROCk module is loaded[/COLOR] HSA Error: Incompatible kernel and userspace, Vega 20 [Radeon VII] disabled. Upgrade amdgpu. ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents =========[/FONT]
- Likes 1
Comment
-
Originally posted by pete910 View Post
Seem to recall reading that has now terminated last year IIRC but maybe completely wrong.
Comment
-
Originally posted by pete910 View PostLooks to be that issue as according to rocminfo it's disabling it and I need to upgrade AMDGPU
The required kernel driver changes are on their way upstream but not there yet AFAIK, so just going to newer kernel isn't going to help. I was going to say "if you can go back to 4.0 userspace in the short term that would probably be best" but you already said that you had problems with 4.0. Do you happen to remember what problems you saw ?Test signature
Comment
-
Originally posted by skeevy420 View PostFor all intents and purposes, CDNA is basically headless RDNA; both are the successor to GCN.
https://www.amd.com/system/files/doc...whitepaper.pdf
The classic GCN compute cores contain a variety of pipelines optimized for scalar and vector instructions. In particular, each CU contains
a scalar register file, a scalar execution unit, and a scalar data cache to handle instructions that are shared across the wavefront, such as
common control logic or address calculations. Similarly, the CUs also contain four large vector register files, four vector execution units that
are optimized for FP32, and a vector data cache. Generally, the vector pipelines are 16-wide and each 64-wide wavefront is executed over
four cycles.
The AMD CDNA architecture builds on GCN’s foundation of scalars and vectors and adds matrices as a first class citizen while
simultaneously adding support for new numerical formats for machine learning and preserving backwards compatibility for any software written for the GCN architecture. These Matrix Core Engines add a new family of wavefront-level instructions, the Matrix Fused Multiply-
Add or MFMA. The MFMA family performs mixed-precision arithmetic and operates on KxN matrices using four different types of input
data: 8-bit integers (INT8), 16-bit half-precision FP (FP16), 16-bit brain FP (bf16), and 32-bit single-precision (FP32). All MFMA instructions
produce either 32-bit integer (INT32) or FP32 output, which reduces the likelihood of overflowing during the final accumulation stages of a
matrix multiplication.
Comment
-
Originally posted by bridgman View Post
Yep... the message is a bit misleading though... what it is supposed to be saying is "you can't run 4.1 userspace without the 4.1 kernel driver".
The required kernel driver changes are on their way upstream but not there yet AFAIK, so just going to newer kernel isn't going to help. I was going to say "if you can go back to 4.0 userspace in the short term that would probably be best" but you already said that you had problems with 4.0. Do you happen to remember what problems you saw ?
CPU worked fine.
Comment
Comment