Announcement

Collapse
No announcement yet.

AMD GPU-PRO vs. NVIDIA Linux OpenCL Compute Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by bridgman View Post
    Thanks Michael. My initial impression (actually the initial impression of one of our HPC guys) was that the "-s 1" default came from the PTS test definition file, does that make sense ? He was looking in https://openbenchmarking.org/innhold...e363a013f3de9d

    There are probably other defaults in SHOC itself, will take a look.

    Code:
     <TestSettings>
    <Default>
    <Arguments>-s 1 </Arguments>
    </Default>
    Just heard back that runtime should be <30s on a modern GPU even with large datasets, so looks like patching the test should work.
    Hmm whoops, yes, okay, I see -s 1 set there. Pardon as i overlooked it originally as haven't touched that SHOC test in a while. Will do some testing this weekend on various GPUs this weekend and ensure it's safe to increase to -s 3 univerally. If my memory serves me, I think the reason it was 1 before was that for the max SP FLOPS test, anything greater than 1 was taking like 1+ hours or an extremely long time... But yeah will do some verification soon.
    Michael Larabel
    https://www.michaellarabel.com/

    Comment


    • #22
      Awesome, thanks very much !
      Test signature

      Comment


      • #23
        Originally posted by Michael View Post

        Hmm whoops, yes, okay, I see -s 1 set there. Pardon as i overlooked it originally as haven't touched that SHOC test in a while. Will do some testing this weekend on various GPUs this weekend and ensure it's safe to increase to -s 3 univerally. If my memory serves me, I think the reason it was 1 before was that for the max SP FLOPS test, anything greater than 1 was taking like 1+ hours or an extremely long time... But yeah will do some verification soon.
        Why not also test -s 4 at the same time?

        Comment


        • #24
          More feedback from HPC team... fresh github pull of SHOC on Catalyst Hawaii, wallclock times for -s 4:

          ·FFT: 9s
          ·MD5Hash: 5s
          ·MaxFlops: 25s
          ·DeviceMemory: 35s


          Slower GPUs would take longer, but if the run times become too long the recommendation is to reduce the number of sequential passes via -n parameter (default is 10) rather than reducing size (the amount of parallel work).

          AFAICS larger sizes (-s) are able to take better advantage of large GPUs, while running multiple passes (-n) is done primarily to average out the impact of startup overhead eg. power management realizing that the GPU is sufficiently busy that clocks should be cranked up to maximum. Ideally I think that means every test should run at -s 4 and any test that runs for too long should have the number of passes over-ridden with something like -n 3.

          In a perfect world the number of passes would be reduced on slower GPUs to keep runtime constant but that gets complicated... it's hard to know how long a test is going to run before starting it.

          I guess ideally these benchmarks would have a parameter like "execute enough stuff so you run for at least 30 seconds"
          Last edited by bridgman; 28 March 2016, 01:36 PM.
          Test signature

          Comment


          • #25
            Originally posted by faldzip View Post
            Does anyone get this AMD GPU-PRO driver to work with R7 260X? It is GCN1.1 so should be working fine as R9 290 is, but I've installed this on Ubuntu 15.10 (4.2 kernel) and can't set any other resolution than 1024x768 or 800x600 (on my FHD monitor and TV too). Then ran The Talos Principle's built-in benchmark in this 1024x768 (Ultra-High settings) with OpenGL and Vulkan and I got ~7fps in both (while on the Win10 I have 32 and 36 respectively but in 1080p!). I've managed to add the 1080p mode to xrandr but after enabling it with xrandr everything on the screen looks really shitty (like much lower res upscaled - so the fonts are not quite readable). Does anyone faced such issues with AMD GPU-PRO?
            Just installed Ubuntu 14.04 and exactly same thing happens

            Xorg.0.log says:
            Code:
            [    29.800] (EE) AMDGPU(0): Unknown EDID version 0
            [    30.213] (EE) AMDGPU(0): Unknown EDID version 0
            Last edited by faldzip; 25 March 2016, 04:02 PM.

            Comment


            • #26
              Are you getting an EDID-related message as well ? There were a few reports about some monitors having trouble...
              Test signature

              Comment


              • #27
                Xorg.0.log says:
                Code:
                [    18.307] (II) AMDGPU(0): glamor detected, initialising EGL layer.
                [    18.308] (II) AMDGPU(0): KMS Pageflipping: enabled
                [    18.308] (II) AMDGPU(0): Output DisplayPort-0 using monitor section Monitor0
                [    18.308] (II) AMDGPU(0): Output HDMI-A-0 has no monitor section
                [    18.308] (II) AMDGPU(0): Output DVI-D-0 has no monitor section
                [    18.308] (II) AMDGPU(0): Output DVI-D-1 has no monitor section
                [    18.308] (II) AMDGPU(0): EDID for output DisplayPort-0
                [    18.308] (EE) AMDGPU(0): Unknown EDID version 0
                [    18.308] (II) AMDGPU(0): EDID for output HDMI-A-0
                [    18.308] (II) AMDGPU(0): Printing probed modes for output HDMI-A-0
                [    18.308] (II) AMDGPU(0): Modeline "1024x768"x60.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsyn
                c -vsync (48.4 kHz e)
                [    18.308] (II) AMDGPU(0): Modeline "800x600"x60.3   40.00  800 840 968 1056  600 601 605 628 +hsync +v
                sync (37.9 kHz e)
                [    18.308] (II) AMDGPU(0): Modeline "800x600"x56.2   36.00  800 824 896 1024  600 601 603 625 +hsync +v
                sync (35.2 kHz e)
                [    18.308] (II) AMDGPU(0): Modeline "848x480"x60.0   33.75  848 864 976 1088  480 486 494 517 +hsync +v
                sync (31.0 kHz e)
                [    18.308] (II) AMDGPU(0): Modeline "640x480"x59.9   25.18  640 656 752 800  480 490 492 525 -hsync -vs
                ync (31.5 kHz e)
                [    18.308] (II) AMDGPU(0): EDID for output DVI-D-0
                [    18.308] (II) AMDGPU(0): EDID for output DVI-D-1
                [    18.308] (II) AMDGPU(0): Output DisplayPort-0 disconnected
                [    18.308] (II) AMDGPU(0): Output HDMI-A-0 connected
                [    18.308] (II) AMDGPU(0): Output DVI-D-0 disconnected
                [    18.308] (II) AMDGPU(0): Output DVI-D-1 disconnected
                [    18.309] (II) AMDGPU(0): Using exact sizes for initial modes
                [    18.309] (II) AMDGPU(0): Output HDMI-A-0 using initial mode 1024x768
                It is connected currently to my Sony TV (1920x1080) but I've tried also my BenQ monitor on DVI and same thing happens, but there is still 1 monitor left in my home which I can try

                Regarding OpenCL - just tried to check the OpenCL performance rendering BMW27.blend with Blender 2.77 but it crashes before I even get to the Devices section in the User preferences, because it crashes on the clGetPlatformIDs:
                Code:
                Program received signal SIGSEGV, Segmentation fault.
                0x00007fffe89f9d32 in amdgpu_query_gpu_info ()
                   from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libdrm_amdgpu.so.1
                (gdb) bt
                #0  0x00007fffe89f9d32 in amdgpu_query_gpu_info ()
                   from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libdrm_amdgpu.so.1
                #1  0x00007fffd9d4ffee in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #2  0x00007fffd9d5065b in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #3  0x00007fffd9d52c2b in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #4  0x00007fffd9d42648 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #5  0x00007fffd9a38d53 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #6  0x00007fffd99badf9 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #7  0x00007fffd99bae57 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #8  0x00007fffd99bbba9 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #9  0x00007fffd9981f54 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #10 0x00007fffd99833e7 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #11 0x00007fffd9983576 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #12 0x00007fffd9942ca0 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #13 0x00007fffd995ceb7 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #14 0x00007fffd992c493 in clIcdGetPlatformIDsKHR ()
                   from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libamdocl64.so
                #15 0x00007fffdfd9876e in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libOpenCL.so
                #16 0x00007fffdfd9a647 in ?? () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libOpenCL.so
                #17 0x00007ffff771ea90 in pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103
                #18 0x00007fffdfd98d31 in clGetPlatformIDs () from /usr/lib/x86_64-linux-gnu/amdgpu-pro/libOpenCL.so
                #19 0x0000000001f938c4 in ?? ()
                #20 0x0000000001f963ac in ccl::device_opencl_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) ()
                #21 0x0000000001f7fbf9 in ccl::Device::available_devices() ()
                #22 0x0000000001ed3cf5 in ?? ()
                #23 0x00000000019ff338 in ?? ()
                #24 0x00000000018eef79 in RNA_property_enum_items_ex ()
                #25 0x00000000018eefa5 in RNA_property_enum_items ()
                #26 0x00000000018ef526 in RNA_property_enum_identifier ()
                #27 0x000000000151277b in ?? ()
                #28 0x0000000001519f20 in pyrna_prop_to_py ()
                #29 0x000000000151a270 in ?? ()
                #30 0x00000000029d54e8 in PyEval_EvalFrameEx ()
                #31 0x00000000029d9fa1 in PyEval_EvalFrameEx ()
                #32 0x00000000029dba82 in ?? ()
                #33 0x00000000029dbb88 in PyEval_EvalCodeEx ()
                #34 0x000000000294844f in ?? ()
                ---Type <return> to continue, or q <return> to quit---
                #35 0x000000000291f00a in PyObject_Call ()
                #36 0x0000000001519584 in ?? ()
                #37 0x00000000019ea50c in ?? ()
                #38 0x0000000001404529 in ED_region_panels ()
                #39 0x000000000116cca8 in ?? ()
                #40 0x0000000001403746 in ED_region_do_draw ()
                #41 0x0000000001147557 in wm_draw_update ()
                #42 0x0000000001142d58 in WM_main ()
                #43 0x00000000010ea92a in main ()

                Comment


                • #28
                  The NVIDIA cards returned to doing much better when it came to the texture read bandwidth.
                  Which is strange. What about this brand new HBM on Fury? It supposed to kick the ass, isn't it?

                  Comment


                  • #29
                    Originally posted by SystemCrasher View Post
                    Which is strange. What about this brand new HBM on Fury? It supposed to kick the ass, isn't it?
                    Have you read posts 17-24 yet ? AFAICS there isn't enough work in the "small" benchmark option (-s 1) to occupy all the shaders on a large GPU or allow much latency hiding (or for the power management hardware to decide the clocks need to be raised).

                    Michael is going to look into running with "large" settings (-s 3 or better yet -s 4) when time permits.
                    Last edited by bridgman; 25 March 2016, 06:43 PM.
                    Test signature

                    Comment


                    • #30
                      Originally posted by bridgman View Post
                      Have you read posts 17-24 yet ?
                      Yes, but...

                      AFAICS there isn't enough work in the "small" benchmark option (-s 1) to occupy all the shaders on a large GPU or allow much latency hiding.

                      Michael is going to look into running with "large" settings (-s 3 or better yet -s 4) when time permits.
                      ...but it seems I've failed to get idea it could be SO drastic XD. Thanks for making it clear. Seems these benchmarks have to be scrapped.
                      Last edited by SystemCrasher; 25 March 2016, 06:26 PM.

                      Comment

                      Working...
                      X