Announcement

Collapse
No announcement yet.

Radeon ROCm 4.3 Released With HMM Allocations, Many Other Improvements

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by zboszor View Post
    Does it use upstream LLVM yet instead of unstable forks of LLVM? Last time I saw, the shader compiler used branchpoint versions of LLVM patched with AMDs own code and the patch didn't apply to the final version of LLVM of the same branch.
    It updated the following package, not sure how much it helps you:

    llvm-amdgpu/Ubuntu,now 13.0.0.21295.40300

    Comment


    • #22
      Wonderful sofware, it still doesn't run on most AMD products. I still have to figure out the reason why, OpenCL seems to be much less adopted than CUDA, but I can't see why noone cares about having an alternative to it that at least runs

      Comment


      • #23
        Originally posted by perpetually high View Post
        ​If anyone runs into the following problem when running apt update:
        Code:
        Err:11 http://repo.radeon.com/rocm/apt/debian xenial InRelease
        The following signatures were invalid: EXPKEYSIG 9386B48A1A693C5C James Adrian Edwards (ROCm Release Manager) <[email protected]>
        Error: GDBus.Errorrg.freedesktop.systemd1.UnitMasked: Unit packagekit.service is masked.
        Reading package lists... Done
        W: GPG error: http://repo.radeon.com/rocm/apt/debian xenial InRelease: The following signatures were invalid: EXPKEYSIG 9386B48A1A693C5C James Adrian Edwards (ROCm Release Manager) <[email protected]>
        E: The repository 'http://repo.radeon.com/rocm/apt/debian xenial InRelease' is not signed.
        N: Updating from such a repository can't be done securely, and is therefore disabled by default.
        N: See apt-secure(8) manpage for repository creation and user configuration details.
        I had to do the following to get it going:

        wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

        Then just apt update and apt upgrade and you should be good to go.

        Still no rocm-smi included anymore. They've stopped since 4.0.0 and I haven't figured out why.

        I had to do the following sym link to get it going again a few versions ago:

        $ ls -al /usr/bin/rocm-smi
        lrwxrwxrwx 1 root root 42 Mar 26 07:11 /usr/bin/rocm-smi -> /opt/rocm-4.0.0/bin/rocm_smi_deprecated.py

        I use rocm-smi all the time to set the mem clocks so if this has been moved elsewhere, if someone could tell me where, i'd appreciate it.
        For me, rocm-smi is in rocm-smi-lib and works perfectly fine, as far as I know its the latest version

        Comment


        • #24
          Originally posted by perpetually high View Post
          ​
          wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
          apt-key is deprecated. Instead use,
          Code:
          wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmour > /etc/apt/trusted.gpg.d/rocm.gpg
          Or something like that.

          Comment


          • #25
            Originally posted by Lemonzest View Post

            For me, rocm-smi is in rocm-smi-lib and works perfectly fine, as far as I know its the latest version
            Awesome, thanks man, that was it.

            Comment


            • #26
              Well that's annoying, new version has a bug if you build-in amdgpu into the kernel (since it won't show up in lsmod or in list of modules loaded).

              Code:
              $ rocm-smi
              ERROR:root:D​river not initialized (amdgpu not found in modules)
              I looked at the code
              Code:
              def driverInitialized():
              """ Returns true if amdgpu is found in the list of initialized modules
              """
              driverInitialized = ''
              try:
                  driverInitialized = str(subprocess.check_output("cat /proc/modules|grep amdgpu", shell=True))
              except subprocess.CalledProcessError:
                  pass
              if len(driverInitialized) > 0:
                  return True
              return False
              Seems like a pretty half-ass way of checking (but I get it). Can fix this by adding return True to the top of the function but not a long-term solution obviously. Hey bridgman - sorry to tag, but if you can file this bug or throw this on someone's radar. Wouldn't know where to begin. Appreciate it, thanks

              Comment


              • #27
                I'm starting to read ROCm release articles in the same light as "Linux now runs on some old gaming console" articles. Not really useful, but still fun and quirky that someone put the effort into supporting that old hardware.

                Comment


                • #28
                  Like to note, what rocm "supports" and what it supports are pretty different. The .deb files are built to support a few models, the source has support for virtually every model, which works to some extent, I've got tensorflow up on a gfx902 by building rocm & tensorflow from source.

                  With this rocm-build, you can build rocm for many cards the rocm .deb files do not support, AMDGPU_TARGETS=(list of cards you want your build to support), and run a bunch of shell scripts in order it builds the .deb files and installs them (putting everything in /opt/rocm). Creator of this says they initially wanted to re-enable the disabled support for their gfx803 card; I built mine for gfx902.

                  Also look in the gfx803 and navixxx directories, the important patch is for miopen, miopen (at least up to rocm-4.2) does not honor AMDGPU_TARGETS, one of the patches patches miopen build file to add wanted GPUs to the list, so I did a similar patch to add gfx902 to the miopen list.

                  Warning: have a ton of RAM+swap handy, llvm or something is a collosal RAM hog and the scripts run like 8 jobs at a time. Didn't think I'd need to turn on swap on a 32GB system, but for one (I think miopen) I got Out Of Memory, turned on like 40GB swap (on the HDD, I'd rather not wear out the SSD for some bull....) and was shocked to see it use close to 20GB of it (for about 10-15 seonds, then the build dropped from it's about 48GB peak to more like 20GB RAM usage.)

                  Edit: Also skip the steps for building amdgpu, recent kernels have that stuff built in.
                  build scripts for ROCm. Contribute to xuhuisheng/rocm-build development by creating an account on GitHub.
                  Last edited by hwertz; 05 August 2021, 01:10 AM.

                  Comment


                  • #29
                    Wow, just found this head-scratcher of a comment: https://github.com/RadeonOpenCompute...ment-893584143

                    bridgman sorry for the ping, but is this really the policy around all the open source compute code? I assumed the lack of third party/community contributions was because of the steep learning curve, but flat out not accepting them is not a good look...

                    Comment


                    • #30
                      Originally posted by StillStuckOnSI View Post
                      Wow, just found this head-scratcher of a comment: https://github.com/RadeonOpenCompute...ment-893584143

                      bridgman sorry for the ping, but is this really the policy around all the open source compute code? I assumed the lack of third party/community contributions was because of the steep learning curve, but flat out not accepting them is not a good look...
                      I saw this a bit earlier and have already asked for clarification.

                      I think the poster is trying to say "these repos are only for publishing - we won't be accepting pull requests directly into these trees but will be integrating the change into our internal trees so that it flows through to subsequent releases".

                      At least I hope so.

                      Even my interpretation does not fix the current release though, so we might need to do both. Anyways, I have the discussion going, not sure where it will end up yet.

                      EDIT - I heard back from Vlad and he is going to edit his comment. TL;DR is that we do accept third party contributions, we just don't do it by accepting pull requests directly into our "publishing" trees.

                      Hopefully over time the need for this distinction will go away as we get more of the upper level component teams working directly in public trees for core functionality and keeping secret stuff like support for unreleased products in overlay branches, but that is necessarily a slow process because (a) so many people need to buy in and (b) it needs to be done very conservatively because any slip of "secret" information sets the whole open source effort back a long way.

                      MORE EDIT - new text:

                      "Thank you for bringing this issue up. We are currently testing this change internally. We are also planning to remove the bundled OpenCL ICD from the tree. I cannot give an ETA for this change, but most likely it will be publicly available with ROCm 5.0."
                      Last edited by bridgman; 05 August 2021, 02:37 PM.
                      Test signature

                      Comment

                      Working...
                      X