Announcement

Collapse
No announcement yet.

Radeon ROCm 1.9.1 vs. NVIDIA OpenCL Linux Plus RTX 2080 TensorFlow Benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coder
    replied
    Originally posted by Ansla View Post
    So the bandwidth measured by this test is not the actual memory bandwidth? And in that case what is the limiting factor here?
    It can't be measuring actual GDDR6 memory bandwidth, as those numbers are significantly higher than the nominal specs for the RTX parts. See:

    https://en.wikipedia.org/wiki/List_o...orce_20_series

    Leave a comment:


  • bridgman
    replied
    Originally posted by Ansla View Post
    So the bandwidth measured by this test is not the actual memory bandwidth? And in that case what is the limiting factor here?
    Correct - memory bandwidth on a Fury is ~512 GB/s, while bandwidth on a Vega 64 is a bit lower, ~2x the bandwidth of a 1070 (256 GB/s).

    I think the limit here might be texture cache bandwidth but not 100% sure.

    Leave a comment:


  • Ansla
    replied
    Originally posted by bridgman View Post

    In fairness, the "better" part referred to performance vs memory + interconnect power, and AFAIK that is still as true as ever.

    A few years ago (in the Fury timeframe) there was a second advantage for HBM relative to GDDR5, in terms of bandwidth. That was definitely true at the time and is still somewhat true today (a 4-stack HBM2 config gives about around 1000 GB/s) but the move from GDDR5 to GDDR6 in the last couple of months has partially closed the bandwidth gap.

    Vega 56/64 uses a 2-stack HBM2 config which is comparable to the 256-bit GDDR6 memory on RTX 2080.
    So the bandwidth measured by this test is not the actual memory bandwidth? And in that case what is the limiting factor here?

    Leave a comment:


  • Meteorhead
    replied
    Originally posted by bridgman View Post
    I am not aware of plans to re-introduce the CPU runtime (it seemed to be mostly used with Intel CPUs) although now that are are back in the Big Honkin' Server CPU business it's probably worth re-visiting.
    Please do. It was very useful. The runtime has not been touched for years, and even with the Intel runtime being 2X+ faster, it was still very useful for teaching. (HPC for physicists) If it were performant, we'd run production code on it.
    Originally posted by bridgman View Post
    As far as I know we do want SYCL to work on our products, so if deprecating cl_khr_spir is breaking SYCL then we'll need to look into either restoring it or providing an alternative. No plans yet but I have started discussion internally.

    Yeah, SPIR is looking a bit dead right now. That might have been a factor in our deprecating the extension.

    I'm not a user of pro video editing software but I doubt Vegas Pro or Photoshop for that matter would ship kernel code. All these software relied on cl_khr_spir which got pulled from under these people without warning. (As I said, it wasn't in the release notes, as nothing ever is. By the way, Radeon Software 18.2.2 still lists it as a supported extension, the 5th driver to still do so since we reported it on Devgurus.)

    I don't think SPIR is dead. It's infrastructure that is meant to be used and I don't have data, but I think people relied on it. (Read the devgurus topics on SPIR and the CPU runtime missing. People (myself included) were pretty upset.) Pulling the plug on it without any workaround (SPIR-V?) was not a wise management decision.
    Originally posted by bridgman View Post
    Do you have any info re: the AMDGPU back end ? Is that the SPIR-V work or something else ? Either way it sounds like something we should try to engage on.
    Code:
    PS C:\Users\mnagy> compute++.exe -sycl -sycl-target asdf -IC:\Kellekek\Codeplay\ComputeCpp\1.0.2\include C:\Users\mnagy\Source\Repos\Test-Applications\SYCL-GenericLambda\Main.cpp
    compute++.exe: error: [Computecpp:CC0042]: Invalid SYCL target asdf
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target spir
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target spir64
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target spir32
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target spirv
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target spirv64
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target spirv32
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target aorta-x86_64
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target ptx64
    compute++.exe: note: [Computecpp:CC0042]: Valid SYCL target amdgcn
    Being long time ComputeCpp users providing a high flux of bug reports, we were promised to get an actually working AMDGPU back-end sometime in January. The compiler already reports it, but the front-end does not behave properly with it yet.
    Originally posted by bridgman View Post
    Can't comment much re: ROCm on Windows but I think Greg's tweet is fair. In terms of what it means, we don't have KFD/thunk/ROCR on Windows any more so need another way to get compute work down to the GPU... and since PAL is already being used by OpenCL it makes some sense to use that to get HIP/HCC running on Windows.

    I have little knowledge of how all of the puzzle pieces fall into place, (WDDM == DRM?, PAL ~= KFD/thunk/ROCR?) all I'd like is to not feel like a second class citizen in terms of driver support. OpenCL on Windows has not been touched in YEARS. The day-1 release of the 2.0 runtime feels like the last time anything happened in OpenCL land on Windows. Sic transit gloria mundi.

    Originally posted by bridgman View Post
    OpenCL on WSL... at first glance the PAL paths should be a pretty good fit since we support OpenCL/PAL on both Linux & Windows, at least for Vega56/64. I would rather see KFD/ROCR paths but AFAIK WSL doesn't have any support for Linux kernel drivers, just userspace apps.
    Indeed, the kernel is not really there, but there are Windows drivers to hook into. If VirtualCL managed to expose devices over a cluster inside a single context (OpenCL 1.2 that is), then a similar mechanism might work in WSL land as well, relaying things to the Windows drivers. Marhsalling memory between a pico process (the subsystem type processes) and an NT process is tricky though. I got no clue how it works, I just know the buzzwords. However, this is some interesting stuff. I'd love to be able to test my GPU code locally, without having to dual-boot or remote deploy.

    And if my RX580 could make the cut, that would be awesome. (I got a GL702ZC as a devbox.)

    Leave a comment:


  • coder
    replied
    Originally posted by Marc Driftmeyer View Post
    What an absolutely moronic test. Wait until the MI60 arrives to test TensorFlow against the 2080 series.
    Why wait? Vega has packed fp16. That and fp32 are all he's using. I'm not convinced we're seeing the RTX cards' Tensor cores spin up, so it's a fair test.

    Leave a comment:


  • bridgman
    replied
    I am not aware of plans to re-introduce the CPU runtime (it seemed to be mostly used with Intel CPUs) although now that are are back in the Big Honkin' Server CPU business it's probably worth re-visiting.

    As far as I know we do want SYCL to work on our products, so if deprecating cl_khr_spir is breaking SYCL then we'll need to look into either restoring it or providing an alternative. No plans yet but I have started discussion internally.

    Originally posted by Meteorhead View Post
    They had to resort to writing an AMDGPU back-end beside PTX, because neither Nvidia neither AMD ships a SPIR compiler.
    Yeah, SPIR is looking a bit dead right now. That might have been a factor in our deprecating the extension.

    Do you have any info re: the AMDGPU back end ? Is that the SPIR-V work or something else ? Either way it sounds like something we should try to engage on.

    Can't comment much re: ROCm on Windows but I think Greg's tweet is fair. In terms of what it means, we don't have KFD/thunk/ROCR on Windows any more so need another way to get compute work down to the GPU... and since PAL is already being used by OpenCL it makes some sense to use that to get HIP/HCC running on Windows.

    Including compute changes in release notes makes sense... if we aren't doing it now I will pass the request along.

    OpenCL on WSL... at first glance the PAL paths should be a pretty good fit since we support OpenCL/PAL on both Linux & Windows, at least for Vega56/64. I would rather see KFD/ROCR paths but AFAIK WSL doesn't have any support for Linux kernel drivers, just userspace apps.
    Last edited by bridgman; 14 December 2018, 04:54 PM.

    Leave a comment:


  • Meteorhead
    replied
    Originally posted by bridgman View Post

    What kind of information are you looking for ?
    1) Will the OpenCL CPU runtime return (Win & Linux)?
    2) Will cl_khr_spir extension return? These first two, especially the second are grave issues. cl_khr_spir was relied on by multiple professional editing software, but my interest mainly lies in ComputeCpp, Codeplay's SYCL implementation. They had to resort to writing an AMDGPU back-end beside PTX, because neither Nvidia neither AMD ships a SPIR compiler.
    3) ROCm on Windows? Greg Stoner on Twitter briefly replied "When the MLSE engineering team finishes the HIP port to PAL and moves it to MS Windows." But what does that mean?
    4) Compute stuff in release notes? It would be nice to know if a driver is worth installing if the gaming profiles don't concern me.
    5) The group of Tara Raj at Microsoft are working on GPU support for the Windows Subsystem for Linux. Would AMD care to help making not just CUDA working? (There was a survey posted by Tara recently on Twitter. I hope a lot of other people voted for OpenCL support too, not just CUDA. (I voted OpenCL, OpenGL and Vulkan))

    Thank you for your efforts.

    Leave a comment:


  • bridgman
    replied
    Originally posted by Meteorhead View Post
    @bridgeman, how can I reach out to anyone at AMD who can give information on the Windows drivers? Devgurus, Twitter and the Khronos Slack channel don't seem to be too informative.
    What kind of information are you looking for ? I'm asking partly because the contact points are probably different depending on the type of information, and partly because I don't know myself so I'm trying to collect enough information to start asking around intelligently

    Leave a comment:


  • Meteorhead
    replied
    @bridgeman, how can I reach out to anyone at AMD who can give information on the Windows drivers? Devgurus, Twitter and the Khronos Slack channel don't seem to be too informative.

    Leave a comment:


  • bridgman
    replied
    Originally posted by theriddick View Post
    Surprised to see HBM2 fail to keep up, after all the hype about how expensive it was because it was just better, and it falls flat on its face here...
    In fairness, the "better" part referred to performance vs memory + interconnect power, and AFAIK that is still as true as ever.

    A few years ago (in the Fury timeframe) there was a second advantage for HBM relative to GDDR5, in terms of bandwidth. That was definitely true at the time and is still somewhat true today (a 4-stack HBM2 config gives about around 1000 GB/s) but the move from GDDR5 to GDDR6 in the last couple of months has partially closed the bandwidth gap.

    Vega 56/64 uses a 2-stack HBM2 config which is comparable to the 256-bit GDDR6 memory on RTX 2080.

    Michael I just noticed that the perf/watt comparisons only cover a couple of synthetics related to texture cache bandwidth where the NVidia parts do have a known advantage. Any chance of adding a real-world perf/watt test, eg Luxmark: Luxball HDR ?
    Last edited by bridgman; 14 December 2018, 12:22 PM.

    Leave a comment:

Working...
X