Announcement

**perpetually high** · 04 August 2021, 02:33 AM

Originally posted by zboszor View Post

Does it use upstream LLVM yet instead of unstable forks of LLVM? Last time I saw, the shader compiler used branchpoint versions of LLVM patched with AMDs own code and the patch didn't apply to the final version of LLVM of the same branch.

It updated the following package, not sure how much it helps you:

llvm-amdgpu/Ubuntu,now 13.0.0.21295.40300

**sweetsuicide** · 04 August 2021, 03:35 AM

Wonderful sofware, it still doesn't run on most AMD products. I still have to figure out the reason why, OpenCL seems to be much less adopted than CUDA, but I can't see why noone cares about having an alternative to it that at least runs

**Lemonzest** · 04 August 2021, 05:24 AM

Originally posted by perpetually high View Post

If anyone runs into the following problem when running apt update:

Code:

Err:11 http://repo.radeon.com/rocm/apt/debian xenial InRelease
The following signatures were invalid: EXPKEYSIG 9386B48A1A693C5C James Adrian Edwards (ROCm Release Manager) <[email protected]>
Error: GDBus.Errorrg.freedesktop.systemd1.UnitMasked: Unit packagekit.service is masked.
Reading package lists... Done
W: GPG error: http://repo.radeon.com/rocm/apt/debian xenial InRelease: The following signatures were invalid: EXPKEYSIG 9386B48A1A693C5C James Adrian Edwards (ROCm Release Manager) <[email protected]>
E: The repository 'http://repo.radeon.com/rocm/apt/debian xenial InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

I had to do the following to get it going:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

Then just apt update and apt upgrade and you should be good to go.

Still no rocm-smi included anymore. They've stopped since 4.0.0 and I haven't figured out why.

I had to do the following sym link to get it going again a few versions ago:

$ ls -al /usr/bin/rocm-smi
lrwxrwxrwx 1 root root 42 Mar 26 07:11 /usr/bin/rocm-smi -> /opt/rocm-4.0.0/bin/rocm_smi_deprecated.py

I use rocm-smi all the time to set the mem clocks so if this has been moved elsewhere, if someone could tell me where, i'd appreciate it.

For me, rocm-smi is in rocm-smi-lib and works perfectly fine, as far as I know its the latest version

**Imroy** · 04 August 2021, 09:55 AM

Originally posted by perpetually high View Post

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

apt-key is deprecated. Instead use,

Code:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmour > /etc/apt/trusted.gpg.d/rocm.gpg

Or something like that.

**perpetually high** · 04 August 2021, 10:07 AM

Originally posted by Lemonzest View Post

For me, rocm-smi is in rocm-smi-lib and works perfectly fine, as far as I know its the latest version

Awesome, thanks man, that was it.

**perpetually high** · 04 August 2021, 10:15 AM

Well that's annoying, new version has a bug if you build-in amdgpu into the kernel (since it won't show up in lsmod or in list of modules loaded).

Code:

$ rocm-smi
ERROR:root:Driver not initialized (amdgpu not found in modules)

I looked at the code

Code:

def driverInitialized():
""" Returns true if amdgpu is found in the list of initialized modules
"""
driverInitialized = ''
try:
    driverInitialized = str(subprocess.check_output("cat /proc/modules|grep amdgpu", shell=True))
except subprocess.CalledProcessError:
    pass
if len(driverInitialized) > 0:
    return True
return False

Seems like a pretty half-ass way of checking (but I get it). Can fix this by adding return True to the top of the function but not a long-term solution obviously. Hey bridgman - sorry to tag, but if you can file this bug or throw this on someone's radar. Wouldn't know where to begin. Appreciate it, thanks

**extremesquared** · 04 August 2021, 07:58 PM

I'm starting to read ROCm release articles in the same light as "Linux now runs on some old gaming console" articles. Not really useful, but still fun and quirky that someone put the effort into supporting that old hardware.

**hwertz** · 05 August 2021, 12:50 AM

Like to note, what rocm "supports" and what it supports are pretty different. The .deb files are built to support a few models, the source has support for virtually every model, which works to some extent, I've got tensorflow up on a gfx902 by building rocm & tensorflow from source.

With this rocm-build, you can build rocm for many cards the rocm .deb files do not support, AMDGPU_TARGETS=(list of cards you want your build to support), and run a bunch of shell scripts in order it builds the .deb files and installs them (putting everything in /opt/rocm). Creator of this says they initially wanted to re-enable the disabled support for their gfx803 card; I built mine for gfx902.

Also look in the gfx803 and navixxx directories, the important patch is for miopen, miopen (at least up to rocm-4.2) does not honor AMDGPU_TARGETS, one of the patches patches miopen build file to add wanted GPUs to the list, so I did a similar patch to add gfx902 to the miopen list.

Warning: have a ton of RAM+swap handy, llvm or something is a collosal RAM hog and the scripts run like 8 jobs at a time. Didn't think I'd need to turn on swap on a 32GB system, but for one (I think miopen) I got Out Of Memory, turned on like 40GB swap (on the HDD, I'd rather not wear out the SSD for some bull....) and was shocked to see it use close to 20GB of it (for about 10-15 seonds, then the build dropped from it's about 48GB peak to more like 20GB RAM usage.)

Edit: Also skip the steps for building amdgpu, recent kernels have that stuff built in.

**StillStuckOnSI** · 05 August 2021, 12:18 PM

Wow, just found this head-scratcher of a comment: https://github.com/RadeonOpenCompute...ment-893584143

bridgman sorry for the ping, but is this really the policy around all the open source compute code? I assumed the lack of third party/community contributions was because of the steep learning curve, but flat out not accepting them is not a good look...

**bridgman** · 05 August 2021, 01:48 PM

Originally posted by StillStuckOnSI View Post

Wow, just found this head-scratcher of a comment: https://github.com/RadeonOpenCompute...ment-893584143

bridgman sorry for the ping, but is this really the policy around all the open source compute code? I assumed the lack of third party/community contributions was because of the steep learning curve, but flat out not accepting them is not a good look...

I saw this a bit earlier and have already asked for clarification.

I think the poster is trying to say "these repos are only for publishing - we won't be accepting pull requests directly into these trees but will be integrating the change into our internal trees so that it flows through to subsequent releases".

At least I hope so.

Even my interpretation does not fix the current release though, so we might need to do both. Anyways, I have the discussion going, not sure where it will end up yet.

EDIT - I heard back from Vlad and he is going to edit his comment. TL;DR is that we do accept third party contributions, we just don't do it by accepting pull requests directly into our "publishing" trees.

Hopefully over time the need for this distinction will go away as we get more of the upper level component teams working directly in public trees for core functionality and keeping secret stuff like support for unreleased products in overlay branches, but that is necessarily a slow process because (a) so many people need to buy in and (b) it needs to be done very conservatively because any slip of "secret" information sets the whole open source effort back a long way.

MORE EDIT - new text:

"Thank you for bringing this issue up. We are currently testing this change internally. We are also planning to remove the bundled OpenCL ICD from the tree. I cannot give an ETA for this change, but most likely it will be publicly available with ROCm 5.0."

Announcement

Radeon ROCm 4.3 Released With HMM Allocations, Many Other Improvements

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment