Announcement

Collapse
No announcement yet.

AMD Releases ROCm 4.2 Compute Stack

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by wizard69 View Post
    I have not idea what problems AMD might be having but I do know that finding qualified workers in a different tech industry is extremely difficult.
    of course it is difficult thats why i ask him...
    is there any success in finding people ?

    Originally posted by wizard69 View Post
    It isn't so much enterprise but rather government contracts for super computing that has AMD focuses on CDNA. This is so important to AMD that I really doubt consumer cards will get much focus at all this year. Once ROCm is good enough to meet contract obligations they will likely expand hardware support.
    but isn't "government contracts for super computing" called enterprise business ?
    last time i checked star trek... enetrprise is the flag ship of all ships...

    Originally posted by wizard69 View Post
    It isn't that easy. You can't hire stupid people to make advancements in such a project. By the way I don't use the word "stupid" here lightly, people seem to forget that you need people with innate capability to pull off these projects. Further even a "smart" person takes time to get up to speed, hire a person today and it cold be 6 months later before they are contributing significantly.
    yes sure i have IQ130+ and i could not do the job...
    so we do not talk about stupid or smart people we talk about super human overkill smart people...

    thats why i asked.. did they have some success finding those kind of super smart people ?
    Phantom circuit Sequence Reducer Dyslexia

    Comment


    • #22
      Originally posted by vegabook View Post
      How much does it cost to hire 20 top people from Nvidia (or Intel)? Ultra generously, 500k each per year? That's 10 million dollars per year. Since end of 2018 AMD has massive fundraising capacity thanks to its market capitalisation rocketing above 20 billion that's 20, thousand, million (and today, 100 billion). AMD is ENTIRELY capable of having made this happen 2.5 years ago, more than enough time for the skills ramp-up, especially if you pay properly and hire properly,
      right,... its not impossible to hire people for amd... for 500K per year you get decent people.
      and yes its insane that a 1800€ 6900XT does not have good support in ROCm...

      i could unterstand if they have less good support for like a 200€ 5500XT...
      but a 1800€ card should be able to perform compute in a sound way.

      Phantom circuit Sequence Reducer Dyslexia

      Comment


      • #23
        Originally posted by MadeUpName View Post
        Resolve can use openCL on AMD. It just can't be ROCM, it has to be AMDGPU-Pro which is only available for 2 distros. And it can't be any version other than 1.2 as AMD broke some thing when they went past that.
        How do you know the bug isn't somewhere in Resolve, and specific to some code path that's enabled by a later version of OpenCL or features that ROCm supports and AMDGPU-Pro doesn't?

        We could even be talking about multiple bugs. If the main userbase is only using 1.2 and Nvidia or AMDGPU-Pro, there might not be the incentive to test those other codepaths and fix the bugs in them.

        Comment


        • #24
          Originally posted by vegabook View Post
          I got tired of waiting: https://www.ebay.co.uk/itm/124699576...8AAOSwfBhgiIof

          Flogged the VII in which I had invested so many FP64 and machine learning hopes and dreams.
          Well, at least you turned a profit on it!

          Originally posted by vegabook View Post
          Koduri left AMD with this card in development, it was far enough along that they actually launched it. But it was "politically incorrect" 'cos it was too powerful on compute and not powerful enough on gaming, and was a Koduri creation, so AMD ignored it. Trashed their own creation. Within weeks almost of its launch AMD was already allowing rumours to fly that it was a dead end.
          I dunno, man. I got the sense that his departure was something of a mutual decision, if he wasn't actually driven out. He's the "genius" behind the underwhelming Vega 56/64, which had neither tons of compute horsepower nor was it terribly competitive in gaming. Plus, it was expensive to make and it burned a ton of power. That's hardly a home run!

          The reason it was rumored to be compute-only is that they didn't know if it would be a price-competitive gaming card. It had 4 stacks of HBM2, after all. You don't see that on a Nvidia card costing less than $3k! Fortunately, thanks to Nvidia's missteps on the RTX 2000 series, a market opportunity opened up that AMD decided to seize. They continued selling the card for nearly a year, and it was in relatively abundant supply though most of that time (including at the end).

          Seriously, where do you even get this stuff? Do you have any source on that, at all, or are you just trolling? The most charitable explanation I can see is that, in your victim mindset, you constructed a hero cult around Koduri as your fallen savior.

          Decisions like whether to build a compute vs. gaming-oriented card aren't something that one designer just decides, like he's some kind of auteur! Decisions about when to build a GPU that's compute & AI-focused are market-driven and strategic decisions that are agreed and signed off, at the highest levels. Furthermore, if AMD really hated Vega 20 so much, why does CDNA have so much continuity with it? In fact, CDNA goes even further on compute and completely ditches gaming, as if to double-down on the very approach you say AMD now disfavors!

          Comment


          • #25
            Originally posted by vegabook View Post
            Challenge to AMD: fire the entire ROCm management team and watch your cap go up 20%.
            They do seem to have done a remarkably bad job of learning Nvidia's recipe for success.

            Comment


            • #26
              Originally posted by coder View Post
              Well, at least you turned a profit on it!


              I dunno, man. I got the sense that his departure was something of a mutual decision, if he wasn't actually driven out. He's the "genius" behind the underwhelming Vega 56/64, which had neither tons of compute horsepower nor was it terribly competitive in gaming. Plus, it was expensive to make and it burned a ton of power. That's hardly a home run!

              The reason it was rumored to be compute-only is that they didn't know if it would be a price-competitive gaming card. It had 4 stacks of HBM2, after all. You don't see that on a Nvidia card costing less than $3k! Fortunately, thanks to Nvidia's missteps on the RTX 2000 series, a market opportunity opened up that AMD decided to seize. They continued selling the card for nearly a year, and it was in relatively abundant supply though most of that time (including at the end).

              Seriously, where do you even get this stuff? Do you have any source on that, at all, or are you just trolling? The most charitable explanation I can see is that, in your victim mindset, you constructed a hero cult around Koduri as your fallen savior.

              Decisions like whether to build a compute vs. gaming-oriented card aren't something that one designer just decides, like he's some kind of auteur! Decisions about when to build a GPU that's compute & AI-focused are market-driven and strategic decisions that are agreed and signed off, at the highest levels. Furthermore, if AMD really hated Vega 20 so much, why does CDNA have so much continuity with it? In fact, CDNA goes even further on compute and completely ditches gaming, as if to double-down on the very approach you say AMD now disfavors!
              Of course, I'm exaggerating. But I'm not trolling. The Radeon VII story is a shambolic missed opportunity worthy of the AMD of ten years ago.

              Comment


              • #27
                But before anyone asks, no, there still is not any official support for GFX10 / Navi GPUs whether it be the Radeon RX 5000 or RX 6000 series. We've heard that AMD is working on Navi support for ROCm but for the time being seem to be primarily focused on their CDNA support and prior GFX9 Vega GPUs.
                Fail. The lack of support for cards that people can actually, y'know, buy is a such a pain.

                ROCm 4.2 isn't the most exciting feature release in recent time but continues advancing this open-source GPU compute stack steadily to make it more compelling and capable for forthcoming super computer deployments and other HPC use-cases and continuing to make it easier to port NVIDIA CUDA code over to HIP/ROCm.
                I agree this is a really exciting feature and one I dearly want to see implemented fully. If cards I could actually buy were supported, though, it might actually mean something.

                I tried to get ROCm working on a (very much unsupported) configuration with my new APU mini-PC toy. That was a huge waste of time and effort. Still, at least I know that I really need to stick to supported configurations to have any chance of it working, so I learned something important.

                Comment


                • #28
                  Originally posted by vegabook View Post

                  How much does it cost to hire 20 top people from Nvidia (or Intel)? Ultra generously, 500k each per year? That's 10 million dollars per year. Since end of 2018 AMD has massive fundraising capacity thanks to its market capitalisation rocketing above 20 billion that's 20, thousand, million (and today, 100 billion). AMD is ENTIRELY capable of having made this happen 2.5 years ago, more than enough time for the skills ramp-up, especially if you pay properly and hire properly, consistent with the fact that as AMD, at the time, is one of only 2 entities on earth with credible GPU IP. Yet here we are in mid 2021, and AMD is still flailing about on compute while Nvidia laughs all day long.

                  This ROCm mess has absolutely no excuses. Ever since AMD bought ATI, RTG has been a second class citizen. Mercy me this division still has value and needs to be SOLD out of AMD's incompetent (GPU-wise) hands sharphish before Intel (or Qualcomm) eat the leftovers of the lunch that Nvidia has already and continues to feast on.

                  I get Cuda 10 on a 100 dollar Jetson Nano. For C**** sake. Don't tell me a 700 dollar Navi card is still waiting. Honestly.

                  Challenge to AMD: fire the entire ROCm management team and watch your cap go up 20%.
                  why amd don't creat a trainee program for the next years and the future? 500k? Most of programmers in europe, asia, africa don't earn such thing not even close in ten years. It's time to amd put more people working in their drivers... the big problem in the last 30 years of ati/amd gpus are the the drivers, they always suck great hardware with horrible drivers, even intel start making good drivers, stable and performance ones, if intel make a good dgpuu it will crush amd gpu like nvidia does with their tech, the problem with amd gpu is not hardware is software/drivers. My two desktop at home have ryzen cpu but I don't want to buy amd gpu for some reason

                  Comment


                  • #29
                    Originally posted by extremesquared View Post
                    That is very interesting. What is keeping them off the official support list? I've avoided attempts due to the somewhat haphazard management of the github issue tracker -- where all navi+ issues still seem to just get closed without resolution.
                    The graphics driver releases (which now include ROCm-based OpenCL for Navi) go through a different QA org (the graphics QA team rather than datacenter QA team) and come from different release branches. I believe all of the fixes we put in for Navi OpenCL should be back in mainline and datacenter release branches now but datacenter team (who writes the ROCm release notes) is not yet testing on Navi. We should be indicating Navi support in the graphics driver releases though.

                    We are working to improve the integration between the orgs now that more code is being shared. Key problem AFAICS is that the ROCm repos are serving two conflicting purposes - the datacenter release vehicle AND the upstream for code used in both datacenter and graphics releases.
                    Test signature

                    Comment


                    • #30
                      Originally posted by Qaridarium
                      so its not a big problem just a management failure. you could fix this easily ...
                      Management failures tend to be some of the most intractable, because they require upper management to recognize the problem, which often doesn't happen due to the filtering of information that occurs in the problematic layers of management beneath them. Think of it this way: a bad manager is going to tell their boss facts that align with their world-view. Presumably, the underling's actions also align with that world-view. What the upper manager sees is a set of actions that are consistent with the facts they're being told. So, as long as financial performance is hitting its targets, no problem is detected by the upper management and no changes are made.

                      Analyzing the counterfactuals (i.e. trying to figure out what would've happened if a different set of actions were taken) is almost never done, especially when things seem to be going well and there's not been a complete melt-down.

                      Originally posted by Qaridarium
                      yes i think AMD very often does irritate their customers not because of the hardware or software but managment failure and PR failure.
                      Warning: the following is pure speculation. As, I know far too little about AMD's graphics division to make these claims, consider it purely hypothetical.

                      AMD sees their customers as:
                      • Gamers
                      • Miners
                      • HPC
                      • Professional (i.e. workstation users)

                      We compute hobbyists and psuedo-professional users (i.e. people using gaming cards for professional apps) probably don't factor into their thinking or planning. If we're not buying Pro cards and using them on supported distros, we don't qualify for support. The fact that we can ride the coattails of the Professional users and get drivers that work on some consumer cards is already more than I'm sure some people at AMD think we deserve.

                      Originally posted by Qaridarium
                      just hire some people and do a driver Gui... or just improve the gnome control center to have all relevant driver options.
                      It would have to come out of some budget that might not exist. Someone might have to make a RoI (Return-on-Investment) case, for it to happen, which means estimating how much more revenue AMD would get, if they did that work. That's a tricky case to make, with such an amorphous, diverse, and fickle group of customers.

                      Originally posted by Qaridarium
                      in the past we could accept that amd did not do it because of the lag of money.. but now you should have money to do so.
                      But the money and authority on how to spend it might still be in the wrong places to get what you want.

                      The key point is this: just because a company is profitable doesn't mean it can spend money on every little thing that anyone wants. Investors want to see that money a business re-invests is going to produce future profits, or else they're going to demand dividends and stock buybacks. So, market opportunities are typically quantified, ranked, and only a few of the highest get pursued (trying to pursue too many goals at once is tends to result in accomplishing few or none). Those financial projections that companies sometime put out are based on these activities and other factors.

                      Comment

                      Working...
                      X