Announcement

**Happy Heyoka** · 05 January 2018, 09:13 PM

Is there an easy way to tell if a given piece of hardware supports PCIe atomic operations? I'm assuming we're not just talking about the GPU here but also the PCI controllers, CPU etc?

**bridgman** · 05 January 2018, 10:58 PM

Originally posted by Happy Heyoka View Post

Is there an easy way to tell if a given piece of hardware supports PCIe atomic operations? I'm assuming we're not just talking about the GPU here but also the PCI controllers, CPU etc?

It depends on how the motherboard is designed. If the PCIE connector is wired directly to the CPU then you should be able to tell just by the CPU model - Ryzen/TR/Epyc, Intel Haswell or newer should be OK. If the PCIE connector is wired via PLX switch then the switch has to be configured to route atomics as well - there's a config bit you can check but I'm not sure offhand what an easy way to do that is.

At the moment atomics are one way - GPU is the initiator, motherboard/chipset is the completer.

I believe Kaveri/Carrizo/Bristol also support atomics but right now iGPU and dGPU are programmed quite differently and I don't think we have code to let them co-exist yet.

**dfx.** · 06 January 2018, 07:19 AM

Just freaking DO IT already ! No one wants to fiddle with kernel forks and unmanaged binary packages.
I wish you would drop shitty binary driver on Windows too, replace it with LLVM+Mesa, write simple small GUI which is actual user interface and not dumbed-down ad-infested trojan, drop useless hardware video codecs in favour of more compute units + 8/12/16/32/48/64 packed math and tensor blocks for "realtime" neural-network processing.

**coder** · 06 January 2018, 03:31 PM

Originally posted by dfx. View Post

I wish you would ... drop useless hardware video codecs

Not useless. If you want to use deep learning for high-volume video (or even still image) analysis, their HW decoders actually need to be scaled up. I'm sure they don't occupy much die area, except perhaps on smaller GPUs.

Nvidia claims their P4 accelerator (basically a GTX 1080 with GDDR5 clocked down to use only 75 W) can decode & run GoogLeNet analysis on 93 streams of 30 fps 720p video:

http://nvidianews.nvidia.com/news/ne...ning-inference

There are also several important use cases for hardware encoding, beyond just game streaming. Cloud hosting of games and desktop applications is one, GPU-based transcoding of pre-recorded video (e.g. for "youtube" type sites or video surveillance) is another.

Originally posted by dfx. View Post

in favour of more compute units + 8/12/16/32/48/64 packed math and tensor blocks for "realtime" neural-network processing.

And a bag of chips.

Actually, in an AI-focused chip, I could do without the Rasterizers, ROPs, geometry processors, fp64 support, and display controllers.

**commiethebeastie** · 07 January 2018, 08:19 AM

No HBCC support yet

**polarathene** · 28 January 2018, 08:16 PM

Originally posted by boxie View Post

oh sweet! running compute without out of tree patches, SOON!

4.17 would be 6 months or so away roughly? I'm presently using nvidia, but waiting on ROCm to be in a better state, likely to get AMD in future(or maybe an eGPU for both cards if the x4 bandwidth bottleneck isn't a problem, haven't profiled how often I stress the bandwidth).

Do you use much third party compute resources? How's the support of current stuff supporting ROCm vs just CUDA? I've heard TensorFlow and I think Torch support ROCm now? No idea how that compares to CUDA(I guess Michael will do an article in future?) perf. The proprietary softwares I'm not sure how likely they are to add ROCm support anytime soon... unfortunately require to use some as the open-source alternatives are a long way off from reaching parity with what they do.

Hope to see for developers too some libraries/framework that make it easy to support both CUDA and ROCm without duplicating effort. ArrayFire could be good for that if they add ROCm support?(or is OpenCL in general good enough, ROCm provides some CUDA equivalent libraries afaik that makes supporting compute like machine learning more easier?)

**boxie** · 28 January 2018, 10:32 PM

Originally posted by polarathene View Post

4.17 would be 6 months or so away roughly? I'm presently using nvidia, but waiting on ROCm to be in a better state, likely to get AMD in future(or maybe an eGPU for both cards if the x4 bandwidth bottleneck isn't a problem, haven't profiled how often I stress the bandwidth).

Do you use much third party compute resources? How's the support of current stuff supporting ROCm vs just CUDA? I've heard TensorFlow and I think Torch support ROCm now? No idea how that compares to CUDA(I guess Michael will do an article in future?) perf. The proprietary softwares I'm not sure how likely they are to add ROCm support anytime soon... unfortunately require to use some as the open-source alternatives are a long way off from reaching parity with what they do.

Hope to see for developers too some libraries/framework that make it easy to support both CUDA and ROCm without duplicating effort. ArrayFire could be good for that if they add ROCm support?(or is OpenCL in general good enough, ROCm provides some CUDA equivalent libraries afaik that makes supporting compute like machine learning more easier?)

Stable release? maybe ~5 months away. more to the point is that the open source compute runtime for AMD is nearly 1.0 completion.

OpenCL running on top of it is nice too, we should see continual improvements to it.

I wonder how the ROCm vs CUDA thing will play out - it will be interesting to see

Announcement

AMD Posts Last KFD Kernel Patches For Discrete GPUs, Needed For Upstream ROCm

Comment

Comment

Comment

Comment

Comment

Comment

Comment