Announcement

Collapse
No announcement yet.

GROMACS 2023 Released With Better SYCL For Intel / AMD / NVIDIA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GROMACS 2023 Released With Better SYCL For Intel / AMD / NVIDIA

    Phoronix: GROMACS 2023 Released With Better SYCL For Intel / AMD / NVIDIA

    GROMACS as the widely-used molecular dynamics software issued its stable v2023 release this week with improved GPU support via SYCL...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Again not my area of expertise, but I'd be interested to see how that sycl nvidia backend compares to the cuda one.

    Comment


    • #3
      Originally posted by brucethemoose View Post
      Again not my area of expertise, but I'd be interested to see how that sycl nvidia backend compares to the cuda one.
      (GROMACS dev here)

      In what way? Feature-wise, there are currently some minor limitations like cufft support is missing, but vkfft can be used (since we are only using it as a portability check and development aid). In terms of performance, single-GPU it is about 10-30% slower, depending on a number of factors like problem size, compiler versions, etc.. Multi-GPU scaling is not something we prioritized or tested much at this time on NVIDIA.

      Comment


      • #4
        Originally posted by pszilard View Post
        single-GPU it is about 10-30% slower
        As in sycl is 30% slower?

        Thats still quite good! In machine learning land, any attempt to port away native CUDA I've seen results in an enormous performance hit on Nvidia GPUs, though there are some more recent efforts to narrow the gap (like MLIR).

        Comment


        • #5
          Originally posted by brucethemoose View Post

          As in sycl is 30% slower?
          Yes, that is 0.7x absolute performance (close to worst case) for very small inputs where the runtime overheads relative to CUDA increase.

          Comment


          • #6
            Originally posted by pszilard View Post

            Yes, that is 0.7x absolute performance (close to worst case) for very small inputs where the runtime overheads relative to CUDA increase.
            PS: That was against CUDA with plain streams, at that end of the use-case range (~100-200 microseconds/iteration a dozen or so kernels/iteration), CUDA graphs gives another 5-10% performance.

            Comment

            Working...
            X