Announcement

Collapse
No announcement yet.

GCC 14 Adds "GFX90C" For OpenMP Offloading To APUs With GFX9/Vega Graphics

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Eirikr1848
    replied
    Originally posted by Jumbotron View Post

    My bad. I had OpenCL in mind which Bristol Ridge has OpenCL 2.1 capability and typed Vulkan instead which for Bristol Ridge it’s good for Vulkan 1.2

    And yes all the Fusion era APUs are EOLed.

    The now retired and brilliant ATI/AMD GPU veteran Bridgman conferred with me on several occasions about my love for these engineering marvels and the promise of HSA and how technically at first ROCm could work on Carrizo and Bristol Ridge but you had to have a board without broken IOMMU tables and there was some scatter gather issues as well. So much potential but Lisa Su came in shortly after and overturned the tables and started the era or Ryzen and RDNA (and EPYC and CDNA) . But Kaveri, Carrizo and Bristol Ridge still remain the first and only x86-64 monolithic SoC dies with unified zero copy memory between the on die CPU cores and the on die GPU cores. In fact, in order to cram more transistors above and beyond Carrizo and to upgrade the on-die GPU while still using the 28nm node of Carrizo, AMD actually used masks usually used to make dense GPU layouts on the entire Bristol Ridge APU.

    I’m rocking an Apple MacBook Pro now with their latest M3 Pro SoC. But every now and again I Iook over at my aging and EOLed Bristol Ridge laptop with Ubuntu and smile….remembering there was a fair time when AMD got there first.
    AMD got there first hardware-wise but never software-wise especially with HIP dropping these architectures over time instead of retaining the promise of HSA compatibility.

    Leave a comment:


  • finalzone
    replied
    Originally posted by Jumbotron View Post
    Looking into this further it seems that this is for some of the original Ryzen APUs that had Vega era GPUs. Like Zen 1 Raven Ridge, Zen 2 Renoir and Zen 3 Cezanne. And obviously Vega is GFX9.
    Also Barcelo, an updated Cezanne APU.

    Leave a comment:


  • qarium
    replied
    Originally posted by Jumbotron View Post
    Agreed but with one correction. Monolithic die SoCs are pretty much dead with the exception of mobile. Multiple chiplets on a package is where the industry is and going further. In fact memory is being taken off the motherboard and put on the package closer to the chiplets. It could be that in 10 years or less the motherboard will cease to exist in its present incarnation and will be only big enough to accompany the SoC package and enough room for traces to support external connectors which may be only USB-C for everything but legacy Ethernet. This would be a consumer desktop of course . The enthusiasts board may be just a tad larger for 2 slots max to accommodate a Kilowatt + GPU.
    well but many people do not need and will never need than a small monolithic SOC.. my mother is fine with a dualcore APU produced in 22nm 10 years old... i still have some 34nm computers running around...for 2 people.
    as far as i can see the Qualcomm Elite x SOC is still a monolithic chip... thats more than most people need.

    of course if you go into high performance and you see the new mega APU AMD Strix Halo with 16core ZEN5 with 40cU 2560shader gpu is is a chiplet design out of 3 chips ... and only the 2 zen5 cpu dies is outside of the main chip means they are not able to split the gpu or IO memory controler.

    i am pretty sure what you say will happen but if we see AMD AM5 socket and just imagine a AM6 socket

    they could easily improve it very much just similar to the newest threatripper they can go from 2 ram channels to 4 ram channels doubling the performance(but not for the 2 DIMM designs)
    and then they could go from DDR5 to DDR5 MRDIMM 17 600 MT/s ​what would also again double the performance.

    with that in mind the next 5+ years they will of course keep the classic socket design..

    but of course in 10 years this could all be vanish as you said

    Leave a comment:


  • Jumbotron
    replied
    Originally posted by qarium View Post

    you are right thats all history. and you are right monolithic SoC dies with unified zero copy memory is clearly the future.
    Agreed but with one correction. Monolithic die SoCs are pretty much dead with the exception of mobile. Multiple chiplets on a package is where the industry is and going further. In fact memory is being taken off the motherboard and put on the package closer to the chiplets. It could be that in 10 years or less the motherboard will cease to exist in its present incarnation and will be only big enough to accompany the SoC package and enough room for traces to support external connectors which may be only USB-C for everything but legacy Ethernet. This would be a consumer desktop of course . The enthusiasts board may be just a tad larger for 2 slots max to accommodate a Kilowatt + GPU.

    Leave a comment:


  • Eirikr1848
    replied
    Originally posted by Jumbotron View Post
    Dammit. So close. Hope they eventually go back to GFX8. My now 8 year old AMD Bristol Ridge equipped desktop and laptop ( A12-9800 for the desktop and A12-9700 for the laptop ) would appreciate that. Along with the earlier Carrizo APU which only saw light in laptops and the even earlier Kaveri they remain the only AMD Fusion APUs that reached full HSA 1.0 certification. In addition the integrated on die GPU of the Bristol Ridge is named Wani. It’s a weird blend of Volcanic Island and Pirates Island tech. In other words a sprinkling of Tonga (Volcanic Island) and Fiji (Pirates Island). It’s GCN 3.0 and is capable of OpenCL 2.1 and Vulkan 2.1 along with DirectX 12. The 4 CPU cores are the last iteration of the Bulldozer core, specifically Excavator cores. It’s all tied together with a 48bit zero copy memory scheme between the CPU cores and the GPU. It’s just a shame that these Bristol Ridges were relegated to crappy OEM motherboards in budget computers that had broken MMU tables so it made it difficult to implement HSA much less ROCm which at first stated that these APUs would be supported but that went by the wayside with the advent of Ryzen. It’s supposedly possible but no one has this gear anymore nor wants to hassle getting around the MMU issues. Here’s to hoping OpenMP makes it to GFX8 one day.
    Yeah I bought the desktop version of these systems for cloud developers back in the day, mistakenly believing in local device offload and a whole bunch of other crud.

    It really is the “lost generation”. All GFX8 and really, all/most GFX7 cards should be supported by ROCM as well: I had an R9 390X 8GB and W9100 32GB working with ROCM 4.x on Arch and then decided to install yay and mass upgrade all my packages.

    …. anyway. They’ve been in there, they’ve been dropped. They can be re-added, so long as developer support is there. How to convince AMD and external devs to develop and devote time tho when CUDA is a thing, and these products are already sold?

    no reason that older GFX6 even GFX5 cards couldn’t be included with “feature levels” and “CUDA feature levels” included.

    “Just” requires time in a lab with more cards or a lot of contributor testing.

    ———
    This is one big win for our developer hero, one small step for AI accessibility for all with AMD products from the last 15 years.

    ——-

    Also what happened to the patch that re-added GFX8 support? Did the dev upgrade?
    Last edited by Eirikr1848; 27 April 2024, 03:51 PM.

    Leave a comment:


  • qarium
    replied
    Originally posted by Jumbotron View Post
    My bad. I had OpenCL in mind which Bristol Ridge has OpenCL 2.1 capability and typed Vulkan instead which for Bristol Ridge it’s good for Vulkan 1.2
    And yes all the Fusion era APUs are EOLed.
    The now retired and brilliant ATI/AMD GPU veteran Bridgman conferred with me on several occasions about my love for these engineering marvels and the promise of HSA and how technically at first ROCm could work on Carrizo and Bristol Ridge but you had to have a board without broken IOMMU tables and there was some scatter gather issues as well. So much potential but Lisa Su came in shortly after and overturned the tables and started the era or Ryzen and RDNA (and EPYC and CDNA) . But Kaveri, Carrizo and Bristol Ridge still remain the first and only x86-64 monolithic SoC dies with unified zero copy memory between the on die CPU cores and the on die GPU cores. In fact, in order to cram more transistors above and beyond Carrizo and to upgrade the on-die GPU while still using the 28nm node of Carrizo, AMD actually used masks usually used to make dense GPU layouts on the entire Bristol Ridge APU.
    I’m rocking an Apple MacBook Pro now with their latest M3 Pro SoC. But every now and again I Iook over at my aging and EOLed Bristol Ridge laptop with Ubuntu and smile….remembering there was a fair time when AMD got there first.
    you are right thats all history. and you are right monolithic SoC dies with unified zero copy memory is clearly the future.

    Leave a comment:


  • cbxbiker61
    replied
    Originally posted by antonyshen View Post

    Can you elaborate more on build/setup ROCm 6.1? Do you mean that works for GFX90C with the override environment string?
    I move /usr/bin/ollama to /usr/bin/ollama.bin.
    Then I make a /usr/bin/ollama shell script. As you can see it also overrides for some newer GPUs that are supposed to work. They seem to be working for others, so I included them in my script as much for documentation as anything else.

    The distributed ollama bin includes a rocm llama engine. If it finds a functional system it should work with the script. I built/installed ROCm since I want to compile from source. Ollama compiles with ROCm support quite easily if ROCm is installed correctly. I made a patched version that supports UMA to optimize for integrated graphics.

    Your GPU needs to be configured with more than a 512M graphics buffer memory otherwise ollama will ignore it. My Lenovo automatically (can't change it) sets 2G on a 16G memory machine. My Asus motherboard defaults to 512M and I bumped that up to 4G. With the 4G video config ollama will fully offload llama3 (it fits). On my notebook with 2G it only does a partial offload.

    One other gotcha. It seems a lot of the budget motherboards will crash if you have overclocking enabled. I had to disable D.O.C.P. on the Asus.

    Code:
    #! /bin/bash
    if [[ -x /opt/rocm/bin/clinfo ]]; then
    if /opt/rocm/bin/clinfo | grep -qs 'Name.*gfx1103'; then
       # Radeon 780m
       export HSA_OVERRIDE_GFX_VERSION='11.0.0'
    elif /opt/rocm/bin/clinfo | grep -qs 'Name.*gfx1035'; then
       # Radeon 680m
       export HSA_OVERRIDE_GFX_VERSION='10.3.0'
    elif /opt/rocm/bin/clinfo | grep -qs 'Name.*gfx90c'; then
       export HSA_OVERRIDE_GFX_VERSION='9.0.0'
    fi
    fi
    export OLLAMA_TMPDIR='/var/tmp/OllamaTmp'
    exec /usr/bin/ollama.bin "$@"
    
    ​
    Last edited by cbxbiker61; 26 April 2024, 11:35 PM.

    Leave a comment:


  • antonyshen
    replied
    Originally posted by cbxbiker61 View Post
    In my opinion ROCm is finally good enough to start using on a couple of my GFX90C's (a notebook and a APU). I've built Ollama with HIP and it works amazingly well for AI (llama3). You have to export HSA_OVERRIDE_GFX_VERSION=9.0.0 to get Ollama to work with the GFX90C. I'll have to find some OpenMP software to test gcc's GFX90C support. Yep, fun days.

    It's been a long and painful journey for ROCm. Building the code from source was a nightmare for most of the last 10 years, but I slogged through it over and over again. Finally things started clicking with 6.0. And over the last week I finally got a stable ROCm 6.1 built from source. I have a series of packaging scripts that will make updates in the future easy (although the build time for some of the ROCm packages is very very long).
    Can you elaborate more on build/setup ROCm 6.1? Do you mean that works for GFX90C with the override environment string?

    Leave a comment:


  • cbxbiker61
    replied
    In my opinion ROCm is finally good enough to start using on a couple of my GFX90C's (a notebook and a APU). I've built Ollama with HIP and it works amazingly well for AI (llama3). You have to export HSA_OVERRIDE_GFX_VERSION=9.0.0 to get Ollama to work with the GFX90C. I'll have to find some OpenMP software to test gcc's GFX90C support. Yep, fun days.

    It's been a long and painful journey for ROCm. Building the code from source was a nightmare for most of the last 10 years, but I slogged through it over and over again. Finally things started clicking with 6.0. And over the last week I finally got a stable ROCm 6.1 built from source. I have a series of packaging scripts that will make updates in the future easy (although the build time for some of the ROCm packages is very very long).

    Leave a comment:


  • Jumbotron
    replied
    Looking into this further it seems that this is for some of the original Ryzen APUs that had Vega era GPUs. Like Zen 1 Raven Ridge, Zen 2 Renoir and Zen 3 Cezanne. And obviously Vega is GFX9.

    Leave a comment:

Working...
X