Announcement

Collapse
No announcement yet.

Radeon ROCm 4.0 Released With CDNA GPU Support (Instinct MI100)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by bridgman View Post

    That's the first I have heard of this - are you talking about the string that appears in lspci, or the renderer string, or something else ?
    That is with AMDGPU-Pro OpenCL running on Raven Ridge, in my case Ryzen 2500u. Here is the clinfo
    Code:
    clinfo
    Number of platforms 1
    Platform Name AMD Accelerated Parallel Processing
    Platform Vendor Advanced Micro Devices, Inc.
    Platform Version OpenCL 2.1 AMD-APP (3180.7)
    Platform Profile FULL_PROFILE
    Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
    Platform Host timer resolution 1ns
    Platform Extensions function suffix AMD
    
    Platform Name AMD Accelerated Parallel Processing
    Number of devices 1
    Device Name gfx902
    Device Vendor Advanced Micro Devices, Inc.
    Device Vendor ID 0x1002
    Device Version OpenCL 2.0 AMD-APP (3180.7)
    Driver Version 3180.7 (PAL,HSAIL)
    Device OpenCL C Version OpenCL C 2.0
    Device Type GPU
    Device Board Name (AMD) Unknown AMD GPU
    Device Topology (AMD) PCI-E, 03:00.0
    Device Profile FULL_PROFILE
    Device Available Yes
    Compiler Available Yes
    Linker Available Yes
    Max compute units 8
    SIMD per compute unit (AMD) 4
    SIMD width (AMD) 16
    SIMD instruction width (AMD) 1
    Max clock frequency 1100MHz
    Graphics IP (AMD) 9.2
    Device Partition (core)
    Max number of sub-devices 8
    Supported partition types None
    Supported affinity domains (n/a)
    Max work item dimensions 3
    Max work item sizes 1024x1024x1024
    Max work group size 256
    Preferred work group size (AMD) 256
    Max work group size (AMD) 1024
    Preferred work group size multiple 64
    Wavefront width (AMD) 64
    Preferred / native vector sizes
    char 4 / 4
    short 2 / 2
    int 1 / 1
    long 1 / 1
    half 1 / 1 (cl_khr_fp16)
    float 1 / 1
    double 1 / 1 (cl_khr_fp64)
    Half-precision Floating-point support (cl_khr_fp16)
    Denormals No
    Infinity and NANs No
    Round to nearest No
    Round to zero No
    Round to infinity No
    IEEE754-2008 fused multiply-add No
    Support is emulated in software No
    Single-precision Floating-point support (core)
    Denormals No
    Infinity and NANs Yes
    Round to nearest Yes
    Round to zero Yes
    Round to infinity Yes
    IEEE754-2008 fused multiply-add Yes
    Support is emulated in software No
    Correctly-rounded divide and sqrt operations Yes
    Double-precision Floating-point support (cl_khr_fp64)
    Denormals Yes
    Infinity and NANs Yes
    Round to nearest Yes
    Round to zero Yes
    Round to infinity Yes
    IEEE754-2008 fused multiply-add Yes
    Support is emulated in software No
    Address bits 64, Little-Endian
    Global memory size 2684354560 (2.5GiB)
    Global free memory (AMD) 2551712 (2.434GiB)
    Global memory channels (AMD) 4
    Global memory banks per channel (AMD) 4
    Global memory bank width (AMD) 256 bytes
    Error Correction support No
    Max memory allocation 912680550 (870.4MiB)
    Unified memory for Host and Device Yes
    Shared Virtual Memory (SVM) capabilities (core)
    Coarse-grained buffer sharing Yes
    Fine-grained buffer sharing Yes
    Fine-grained system sharing No
    Atomics No
    Minimum alignment for any data type 128 bytes
    Alignment of base address 2048 bits (256 bytes)
    Preferred alignment for atomics
    SVM 0 bytes
    Global 0 bytes
    Local 0 bytes
    Max size for global variable 821412352 (783.4MiB)
    Preferred total size of global vars 2684354560 (2.5GiB)
    Global Memory cache type Read/Write
    Global Memory cache size 16384 (16KiB)
    Global Memory cache line size 64 bytes
    Image support Yes
    Max number of samplers per kernel 16
    Max size for 1D images from buffer 134217728 pixels
    Max 1D or 2D image array size 2048 images
    Base address alignment for 2D image buffers 256 bytes
    Pitch alignment for 2D image buffers 256 pixels
    Max 2D image size 16384x16384 pixels
    Max 3D image size 2048x2048x2048 pixels
    Max number of read image args 128
    Max number of write image args 64
    Max number of read/write image args 64
    Max number of pipe args 16
    Max active pipe reservations 16
    Max pipe packet size 912680550 (870.4MiB)
    Local memory type Local
    Local memory size 65536 (64KiB)
    Local memory syze per CU (AMD) 65536 (64KiB)
    Local memory banks (AMD) 32
    Max number of constant args 8
    Max constant buffer size 912680550 (870.4MiB)
    Preferred constant buffer size (AMD) 16384 (16KiB)
    Max size of kernel argument 1024
    Queue properties (on host)
    Out-of-order execution No
    Profiling Yes
    Queue properties (on device)
    Out-of-order execution Yes
    Profiling Yes
    Preferred size 262144 (256KiB)
    Max size 8388608 (8MiB)
    Max queues on device 1
    Max events on device 1024
    Prefer user sync for interop Yes
    Number of P2P devices (AMD) 0
    P2P devices (AMD) <printDeviceInfo:147: get number of CL_DEVICE_P2P_DEVICES_AMD : error -30>
    Profiling timer resolution 1ns
    Profiling timer offset since Epoch (AMD) 1608419596369206987ns (Sat Dec 19 15:13:16 2020)
    Execution capabilities
    Run OpenCL kernels Yes
    Run native kernels No
    Thread trace supported (AMD) Yes
    Number of async queues (AMD) 4
    Max real-time compute queues (AMD) 1
    Max real-time compute units (AMD) 0
    printf() buffer size 4194304 (4MiB)
    Built-in kernels (n/a)
    Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p
    The issue affects all Linux operating system. Interesting enough, AMDGPU-Pro OpenCL runs fine on Blender even though Vega8 is listed as AMD GPU unknown.

    I don't think it is neglected on the open source side, at least for GPU. There were some CPU / OS issues with the first Raven parts but I think we're past that now.
    The focus is onthe
    Not sure what you mean by "official AMD driver" since upstream is the primary official AMD driver - guessing you're talking about the prebuilt driver packages on amd-com, either the all-open built from an upstream fork or the hybrid workstation drivers with open source kernel/libdrm/multimedia and closed source OpenGL/Vulkan ?
    Pre-build packages from amd.com for CentOS 8 to clarify. After installing OpenCL, the GPU is listed as AMD GPU unknown. Granted that was on 20.40 and i have yet to test 20.45 and the newer ROCm OpenCL. It would be great to properly list the GPU part of Raven Ridge APU like Vega instead on the generic gfx902.

    Comment


    • #22
      Originally posted by finalzone View Post
      AMDGPU-Pro version is unofficially supported for that APU but the GPU part is listed as unknown thus preventing Davinci Resolve to boot.
      Originally posted by finalzone View Post
      Interesting enough, AMDGPU-Pro OpenCL runs fine on Blender even though Vega8 is listed as AMD GPU unknown.
      Do you have some rationale for believing that the "GPU unknown" string is causing DaVinci Resolve not to work ? I'm not saying that is impossible but it seems unlikely and I don't think we have run across cases where an application makes runtime decisions based on the Device Board Name string. It's not unusual to switch on Device Name though.

      Originally posted by finalzone View Post
      It would be great to properly list the GPU part of Raven Ridge APU like Vega instead on the generic gfx902.
      We could change it I guess but (a) gfx0902 is the most precise indicator of device capabilities we have and (b) we can put generic strings upstream before launch but are not allowed to expose marketing names until after launch, which would interfere with any vendor use of the field IFF they are in fact doing that. If we can confirm that the Device Board Name string is *not* the cause of issues with Resolve then I guess we could change it post-launch, although that is still something we really try to avoid.
      Test signature

      Comment


      • #23
        Originally posted by Spacefish View Post
        At least if games don´t suddently start to use bf16 and such,
        BFloat16 doesn't make a lot of sense for 3D or imaging. Half is a better compromise (which is no coincidence, since that's what it was designed for).

        Originally posted by Spacefish View Post
        we won´t see consumer cards based on CDNA IMHO.
        That much seems clear, but it's yet to be seen how long they'll stick with GCN for APUs.

        Originally posted by Spacefish View Post
        CDNA: Lacks raytracing units,
        It's tricky to extrapolate from an example of 1, but it's looking like CDNA will be compute-only.

        I expect they'll sell workstation- and datacenter- oriented RDNA cards for visualization and cloud-based application hosting.

        Comment


        • #24
          Originally posted by bridgman View Post
          If we can confirm that the Device Board Name string is *not* the cause of issues with Resolve then I guess we could change it post-launch, although that is still something we really try to avoid.
          Isn't there someone at DaVinci you can reach out to? It seems a shame to play a guessing game with this sort of thing.

          Comment


          • #25
            Originally posted by Rakot View Post
            we can develop and test our software stack on both mobile and HPC video cards. This is quite handy.
            Totally agree. I think this is one reason Nvidia doesn't cripple CUDA, on its consumer cards. They realize that a lot of devs are developing on consumer hardware, even if they're deploying on proper cloud hardware.

            Comment


            • #26
              Originally posted by coder View Post
              Isn't there someone at DaVinci you can reach out to? It seems a shame to play a guessing game with this sort of thing.
              It's not just DaVinci though... if we were going to shift policy and start changing driver strings between pre- and post-launch we would need to go out and check with pretty much every software developer out there.

              It's also a bit of a tough sell pushing our ISV relations group to go out and bug DaVinci about supporting Resolve on distros that they don't even support. My understanding is that it already works OK on RHEL/CentOS.
              Test signature

              Comment


              • #27
                Originally posted by Rakot View Post
                AMD is partnering with Kokkos team but it is not enough if we cannot even test already available hardware.
                In fairness, we have been supporting Polaris and Vega consumer dGPUs from the start. Lack of Navi support is awkward though, I agree. See next point for that though.

                Originally posted by Rakot View Post
                On Nvidia side, despite terrible open source support and a number of problems in general like recent "GPL condom" incident, we can develop and test our software stack on both mobile and HPC video cards. This is quite handy.
                Originally posted by coder View Post
                Totally agree. I think this is one reason Nvidia doesn't cripple CUDA, on its consumer cards. They realize that a lot of devs are developing on consumer hardware, even if they're deploying on proper cloud hardware.
                One of the things that has always baffled me is that this point never seems to get mentioned to our sales/marketing/product management folks by our datacenter customers, who all seem happy with developers working and testing on the same server systems (or previous generation) that will be used for deployment.

                It's tough to promote the importance of something internally if our customers are saying "nah we don't need it" to our customer-facing teams. I don't know how to fix that, but in the meantime our developers do talk directly with customer developers enough to understand how it would help. The challenge though is that since those discussions end up being "developer to developer" it still appears internally as if it is AMD developers pushing for this rather than customers.

                Anyways, I think we are making progress on this (making OpenCL-over-ROCm default for the packaged drivers was a big step) and things will continue to improve... and in the meantime there are a lot of Vega cards out there which are fully supported already. I did get confirmation that Renoir is using GPUVM paths by default rather than ATC/IOMMUv2, so that's a start.
                Last edited by bridgman; 20 December 2020, 05:08 PM.
                Test signature

                Comment


                • #28
                  Originally posted by bridgman View Post
                  It's not just DaVinci though... if we were going to shift policy and start changing driver strings between pre- and post-launch we would need to go out and check with pretty much every software developer out there.
                  I had in mind to go the other way - start a discussion about HW compatibility and then maybe you can understand their needs and help them find a better way of doing HW detection.

                  BTW, a hack might be to somehow expose a parameter (kernel boot option?) that lets users manually override this value. Thath sounds like begging for trouble, as there will certainly be some users who set it without really understanding what they're doing and end up forgetting about it and running into problems with other software. So, I don't really see an easy way around having ISVs do HW detection properly.

                  Originally posted by bridgman View Post
                  It's also a bit of a tough sell pushing our ISV relations group to go out and bug DaVinci about supporting Resolve on distros that they don't even support. My understanding is that it already works OK on RHEL/CentOS.
                  I get that. It'd need to be done diplomatically, but that's how I'd probably try to approach it. Every time I get a request from customers, sales, or product management that doesn't make sense, I always try to find out what's behind it and either solve the underlying problem or find a better approach.

                  Comment


                  • #29
                    Originally posted by bridgman View Post
                    It's tough to promote the importance of something internally if our customers are saying "nah we don't need it" to our customer-facing teams. I don't know how to fix that, but in the meantime our developers do talk directly with customer developers enough to understand how it would help. The challenge though is that since those discussions end up being "developer to developer" it still appears internally as if it is AMD developers pushing for this rather than customers.
                    Isn't there anyone doing university relations, or anything like that? Maybe they could do more student outreach.

                    Also, it's a bit paradoxical to talk only to one's existing customers, if one is interested in expanding the customer base. You really ought to be talking to the people who are not your customers, and seeing why not.

                    Comment


                    • #30
                      Originally posted by coder View Post
                      Isn't there anyone doing university relations, or anything like that? Maybe they could do more student outreach.

                      Also, it's a bit paradoxical to talk only to one's existing customers, if one is interested in expanding the customer base. You really ought to be talking to the people who are not your customers, and seeing why not.
                      I suspect it's hit the level of "common knowledge" at this point that compsci students need nvidia cards. That's going to take more than outreach. It will likely take a few years of undoing.

                      Comment

                      Working...
                      X