Announcement

Collapse
No announcement yet.

Taking Radeon ROCm 2.0 OpenCL For A Benchmarking Test Drive

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Anonymous1
    replied
    Hi
    Do you already have any Tensorflow benchmarks?

    Leave a comment:


  • utrrrongeeb
    replied
    Originally posted by klokik View Post
    Does it pass image validation for this result?
    I hadn't thought to check. Having taken another look, my impression of LuxMark is that it iteratively refines the render for a fixed time of 120 seconds, and compares whatever's finished at that point with a converged reference render. There's a threshold (around 15% of pixels mismatch?) where the displayed judgment changes from "pass" to "fail," but the process appears the same. PTS does not appear to read or consider this pass/fail judgment. If I remove the gpu clock limits, the RX580's score heads towards 29686, and somewhere in between it crosses into the "pass" range with 13.26% error. Visually, the output looks right / on the right track; if you know more about Luxmark's technical details, please explain. (I'm also cheating a bit by caching compiled kernels, but that shouldn't explain the size of the results' differences.)

    One wonders whether the GTX1080Ti "passed" in this article, and whether it's getting better than 140 points per watt. :-)

    For the other Luxmark scenes, compilation takes an unreasonably long time (ten minutes), but in most cases works or can be adjusted to work (try disabling -cl-mad-enable). For example, I'm seeing a score of 42710 for Microphone, at 8.85% error. I haven't comprehensively saved results and compared them with Michael's yet. There's the small disadvantage of the mouse-cursor becoming unresponsive when the heavier benchmarks are running, at least in Wayland.
    Last edited by utrrrongeeb; 03 January 2019, 10:27 PM. Reason: added Mic and rough power efficiency

    Leave a comment:


  • klokik
    replied
    Originally posted by finalzone View Post
    ROCm OpenCL now works on AMD Mobile Raven Ridge (Ryzen 2500U as found onHP Envy x360 Convertible) running on updated Fedora 29
    ...
    *******
    Agent 2
    *******
    ...
    Fast F16 Operation: FALSE
    ...
    ISA Info:
    ISA 1
    Fast f16: TRUE

    Preferred / native vector sizes
    char 4 / 4
    short 2 / 2
    int 1 / 1
    long 1 / 1
    half 1 / 1 (cl_khr_fp16)
    float 1 / 1
    double 1 / 1 (cl_khr_fp64)
    Half-precision Floating-point support (cl_khr_fp16)
    Denormals No
    Infinity and NANs No
    Round to nearest No
    Round to zero No
    Round to infinity No
    IEEE754-2008 fused multiply-add No
    Support is emulated in software No
    I'm confused... does mobile Vega has double FLOPS for F16 or not?

    Originally posted by utrrrongeeb View Post
    I ran LuxMark Luxball HDR on an RX580 using Clover, and got 21887
    Does it pass image validation for this result?

    Leave a comment:


  • utrrrongeeb
    replied
    I ran LuxMark Luxball HDR on an RX580 using Clover, and got 21887. That's strangely better than Michael's results – for the GTX1080Ti. Is there anything obvious I might be doing wrong, aside from using PTS v8.0.0, locking dpm clocks to mid-high, and not uploading results yet?

    Leave a comment:


  • finalzone
    replied
    ROCm OpenCL now works on AMD Mobile Raven Ridge (Ryzen 2500U as found onHP Envy x360 Convertible) running on updated Fedora 29 after following the procedure. All previous OpenCL installation from amdgpu-pro are removed prior to that.

    As confirmed from rocminfo
    Code:
    /opt/rocm/bin/rocminfo  
    =====================     
    HSA System Attributes     
    =====================     
    Runtime Version:         1.1
    System Timestamp Freq.:  1000.000000MHz
    Sig. Max Wait Duration:  18446744073709551615 (number of timestamp)
    Machine Model:           LARGE                               
    System Endianness:       LITTLE                              
     
    ==========                
    HSA Agents                
    ==========                
    *******                   
    Agent 1                   
    *******                   
      Name:                    AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
      Vendor Name:             CPU                                 
      Feature:                 None specified                      
      Profile:                 FULL_PROFILE                        
      Float Round Mode:        NEAR                                
      Max Queue Number:        0                                   
      Queue Min Size:          0                                   
      Queue Max Size:          0                                   
      Queue Type:              MULTI                               
      Node:                    0                                   
      Device Type:             CPU                                 
      Cache Info:               
        L1:                      32KB                                
      Chip ID:                 5597                                
      Cacheline Size:          64                                  
      Max Clock Frequency (MHz):2000                                
      BDFID:                   768                                 
      Compute Unit:            8                                   
      Features:                None
      Pool Info:                
        Pool 1                    
          Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
          Size:                    16776832KB                          
          Allocatable:             TRUE                                
          Alloc Granule:           4KB                                 
          Alloc Alignment:         4KB                                 
          Acessible by all:        TRUE                                
      ISA Info:                 
        N/A                       
    *******                   
    Agent 2                   
    *******                   
      Name:                    gfx902                              
      Vendor Name:             AMD                                 
      Feature:                 KERNEL_DISPATCH                     
      Profile:                 FULL_PROFILE                        
      Float Round Mode:        NEAR                                
      Max Queue Number:        128                                 
      Queue Min Size:          4096                                
      Queue Max Size:          131072                              
      Queue Type:              MULTI                               
      Node:                    0                                   
      Device Type:             GPU                                 
      Cache Info:               
        L1:                      16KB                                
      Chip ID:                 5597                                
      Cacheline Size:          64                                  
      Max Clock Frequency (MHz):1100                                
      BDFID:                   768                                 
      Compute Unit:            11                                  
      Features:                KERNEL_DISPATCH  
      Fast F16 Operation:      FALSE                               
      Wavefront Size:          64                                  
      Workgroup Max Size:      1024                                
      Workgroup Max Size Per Dimension:
        Dim[0]:                  67109888                            
        Dim[1]:                  50332672                            
        Dim[2]:                  0                                   
      Grid Max Size:           4294967295                          
      Waves Per CU:            160                                 
      Max Work-item Per CU:    10240                               
      Grid Max Size per Dimension:
        Dim[0]:                  4294967295                          
        Dim[1]:                  4294967295                          
        Dim[2]:                  4294967295                          
      Max number Of fbarriers Per Workgroup:32                                  
      Pool Info:                
        Pool 1                    
          Segment:                 GROUP                               
          Size:                    64KB                                
          Allocatable:             FALSE                               
          Alloc Granule:           0KB                                 
          Alloc Alignment:         0KB                                 
          Acessible by all:        FALSE                               
      ISA Info:                 
        ISA 1                     
          Name:                    amdgcn-amd-amdhsa--gfx902+xnack     
          Machine Models:          HSA_MACHINE_MODEL_LARGE             
          Profiles:                HSA_PROFILE_BASE                    
          Default Rounding Mode:   NEAR                                
          Default Rounding Mode:   NEAR                                
          Fast f16:                TRUE                                
          Workgroup Max Dimension:  
            Dim[0]:                  67109888                            
            Dim[1]:                  1024                                
            Dim[2]:                  16777217                            
          Workgroup Max Size:      1024                                
          Grid Max Dimension:       
            x                        4294967295                          
            y                        4294967295                          
            z                        4294967295                          
          Grid Max Size:           4294967295                          
          FBarrier Max Size:       32                                  
    *** Done ***
    From clinfo
    Code:
    clinfo
    Number of platforms                               1
      Platform Name                                   AMD Accelerated Parallel Processing
      Platform Vendor                                 Advanced Micro Devices, Inc.
      Platform Version                                OpenCL 2.1 AMD-APP (2783.0)
      Platform Profile                                FULL_PROFILE
      Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices  
      Platform Host timer resolution                  1ns
      Platform Extensions function suffix             AMD
     
      Platform Name                                   AMD Accelerated Parallel Processing
    Number of devices                                 1
      Device Name                                     gfx902-xnack
      Device Vendor                                   Advanced Micro Devices, Inc.
      Device Vendor ID                                0x1002
      Device Version                                  OpenCL 1.2  
      Driver Version                                  2783.0 (HSA1.1,LC)
      Device OpenCL C Version                         OpenCL C 2.0  
      Device Type                                     GPU
      Device Available                                Yes
      Device Profile                                  FULL_PROFILE
      Device Board Name (AMD)                         AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
      Device Topology (AMD)                           PCI-E, 03:00.0
      Max compute units                               11
      SIMD per compute unit (AMD)                     4
      SIMD width (AMD)                                16
      SIMD instruction width (AMD)                    1
      Max clock frequency                             1100MHz
      Graphics IP (AMD)                               9.2
      Device Partition                                (core)
        Max number of sub-devices                     11
        Supported partition types                     None
      Max work item dimensions                        3
      Max work item sizes                             1024x1024x1024
      Max work group size                             256
      Compiler Available                              Yes
      Linker Available                                Yes
      Preferred work group size multiple              64
      Wavefront width (AMD)                           64
      Preferred / native vector sizes                  
        char                                                 4 / 4        
        short                                                2 / 2        
        int                                                  1 / 1        
        long                                                 1 / 1        
        half                                                 1 / 1        (cl_khr_fp16)
        float                                                1 / 1        
        double                                               1 / 1        (cl_khr_fp64)
      Half-precision Floating-point support           (cl_khr_fp16)
        Denormals                                     No
        Infinity and NANs                             No
        Round to nearest                              No
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
      Single-precision Floating-point support         (core)
        Denormals                                     Yes
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 Yes
        Round to infinity                             Yes
        IEEE754-2008 fused multiply-add               Yes
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  Yes
      Double-precision Floating-point support         (cl_khr_fp64)
        Denormals                                     Yes
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 Yes
        Round to infinity                             Yes
        IEEE754-2008 fused multiply-add               Yes
        Support is emulated in software               No
      Address bits                                    64, Little-Endian
      Global memory size                              7360856064 (6.855GiB)
      Global free memory (AMD)                        7188336 (6.855GiB)
      Global memory channels (AMD)                    2
      Global memory banks per channel (AMD)           4
      Global memory bank width (AMD)                  256 bytes
      Error Correction support                        No
      Max memory allocation                           6256727654 (5.827GiB)
      Unified memory for Host and Device              Yes
      Minimum alignment for any data type             128 bytes
      Alignment of base address                       1024 bits (128 bytes)
      Global Memory cache type                        Read/Write
      Global Memory cache size                        16384 (16KiB)
      Global Memory cache line size                   64 bytes
      Image support                                   Yes
        Max number of samplers per kernel             5597
        Max size for 1D images from buffer            65536 pixels
        Max 1D or 2D image array size                 2048 images
        Max 2D image size                             16384x16384 pixels
        Max 3D image size                             2048x2048x2048 pixels
        Max number of read image args                 128
        Max number of write image args                8
      Local memory type                               Local
      Local memory size                               65536 (64KiB)
      Local memory syze per CU (AMD)                  65536 (64KiB)
      Local memory banks (AMD)                        32
      Max constant buffer size                        6256727654 (5.827GiB)
      Max number of constant args                     8
      Max size of kernel argument                     1024
      Queue properties                                 
        Out-of-order execution                        No
        Profiling                                     Yes
      Prefer user sync for interop                    Yes
      Profiling timer resolution                      1ns
      Profiling timer offset since Epoch (AMD)        0ns (Wed Dec 31 16:00:00 1969)
      Execution capabilities                           
        Run OpenCL kernels                            Yes
        Run native kernels                            No
        Thread trace supported (AMD)                  No
      printf() buffer size                            4194304 (4MiB)
      Built-in kernels                                 
      Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program  
     
    NULL platform behavior
      clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
      clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
      clCreateContext(NULL, ...) [default]            No platform
      clCreateContext(NULL, ...) [other]              Success [AMD]
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
        Platform Name                                 AMD Accelerated Parallel Processing
        Device Name                                   gfx902-xnack
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
        Platform Name                                 AMD Accelerated Parallel Processing
        Device Name                                   gfx902-xnack
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
        Platform Name                                 AMD Accelerated Parallel Processing
        Device Name                                   gfx902-xnack
    One unfortunate minor issue is the use of no longer maintained pth package since 2006 as dependency instead of modern pthsem. Hopefully it will be rectified in a future and a more simplified installation instruction will be welcome. Darktable, Blender, Gimp and KDenlive were able to detect and use ROCm OpenCL.

    What a good way to end 2018 with that present for mobile Raven Ridge users.

    Leave a comment:


  • tuxd3v
    replied
    Originally posted by Aeder View Post
    I see the AMD compute stack has massive potential but is still stuck at the start line.
    Indeed,
    The first Part of the work seems to be there already..

    Now PowerPlay tunning is needed,
    Because in Linux, AMD cards consumes a lot more power than on Windows..
    Also, of course latency and Optimizations should follow, or be done in parallel..

    But this Rocm version seems to be a good ground for all optimizations stuff..

    Leave a comment:


  • Aeder
    replied
    I see the AMD compute stack has massive potential but is still stuck at the start line.

    Leave a comment:


  • Tim Blokdijk
    replied
    I would be very interested in Blender OpenCl tests.

    Leave a comment:


  • kbios
    replied
    Thanks Michael, very interesting benchmarks! I've found out that raising the HBM frequency can be very beneficial in OpenCL workloads for the Vega cards, for example if I raise it from 945 MHz to 1100 Luxmark goes up to almost 37000 for the Vega 64.

    Leave a comment:


  • tildearrow
    replied
    Typos:

    Originally posted by phoronix View Post
    For your viewing pleasusre today
    Originally posted by phoronix View Post
    In the single precision test, the RX Vega 56 comes out ahead of the GTX 1080 Ti
    No, it does not.

    Leave a comment:

Working...
X