Announcement

Collapse
No announcement yet.

A Look At The LLVMpipe OpenGL Performance On Mesa 19.0 With A 64C/128T Server

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Look At The LLVMpipe OpenGL Performance On Mesa 19.0 With A 64C/128T Server

    Phoronix: A Look At The LLVMpipe OpenGL Performance On Mesa 19.0 With A 64C/128T Server

    Given the proposed Libre RISC-V SoC that could function as a Vulkan accelerator by running the Kazan Vulkan implementation on it, I decided to have a fresh look at how the LLVMpipe performance is for running OpenGL on the CPU. Here are those tests done on a dual socket AMD EPYC server...

    http://www.phoronix.com/scan.php?pag....0-Performance

  • #2
    How well does it scale with cores? IIRC, it didn't scale well beyond 8 cores a while back.

    Comment


    • #3
      This seems strange to me. Looking at the raw perf and bamdwidth of GPUs, when ET was new is nowhere near that of a dual socket EPYC setup. Is it that hard to scale on graphics using many x86 cores?

      Comment


      • #4
        i bet it will run faster on an similar intel CPU, afaik llvmpipe uses avx really well

        Comment


        • #5
          I played Bugs Bunny: Lost in Time... GL game from 1999. on Kaveri APU (for FullHD res that was enough), yes on WINE plus OpenGL using llvm-pipe. Just put mesa's dll and it doing fine on CPU:

          http://fdossena.com/?p=mesa/index.frag

          With that CPU you could do bunny at 8K resolution maybe even 16K i guess, it just use plain GL no extensions and does not need more than 30 fps anyway

          I guess any newer GL version increases requirement for software rendering several times, say 3 times for any version bump, let alone extensions, etc...

          edit: Bugs Bunny: Lost in Time is really lost in time only have some shadow bugs on GL, very ancient GL i guess - or just no one dared to fix it
          Last edited by dungeon; 12-16-2018, 08:46 AM.

          Comment


          • #6


            It says there LLVM 128 bits, maybe that does matter Somewhere else it says 256 bits

            Comment


            • #7
              Originally posted by GruenSein View Post
              How well does it scale with cores? IIRC, it didn't scale well beyond 8 cores a while back.
              Yes, scaling is bad, and for that reason it does not even try to use more than 16 threads (+ the main thread).

              Originally posted by dungeon View Post
              It says there LLVM 128 bits, maybe that does matter Somewhere else it says 256 bits
              It's a result of disabling avx on AMD. I suspect should probably rework that logic now but it would need some testing. The reasoning for the logic is that running with 256bit wide vectors is nowhere near 2 times as fast as with 128bit wide vectors even on intel, and in particular Bulldozer family of chips have pretty bad performance when using 256bit vectors - not only do they split everything into 2x128bit vectors but the decoder actually has quite reduced throughput. Zen doesn't suffer from the latter problem, but I'd still expect 256bit vectors to be a loss (as it still splits things up into 128bit pieces). It should be a win with Zen2, however.
              Also the logic is a bit flawed, since there is actually no need to disable AVX, should only disable 256bit operation, but I don't think this really makes a difference (and with newer llvm versions, it will actually still use AVX anyway).

              Comment


              • #8
                Scaling is indeed quite bad with llvmpipe. Back when I bought my Vega 64 but before support for it landed, I played Quake 1 on llvmpipe for a while. I could get about 30 fps at 1280x720 iirc. Utilization of my AMD 1950X wasn't very good. I suspect that renderer doesn't get a lot of love or attention from the developers since it isn't particularly important for general high performance graphics. It's just a stand-in when hardware 3D isn't available and is capable of rendering your desktop environment if it's accelerated, etc.

                Comment


                • #9
                  I believe LLVM is artificially limited to 16 threads; see src/gallium/drivers/llvmpipe/lp_limits.h l64: #define LP_MAX_THREADS 16. You should be able to just change that 16 to 256 and then you can use the LP_NUM_THREADS env var to take it all the way up to 256 threads. I'd also like to see a comparison with SWR, that'd be great (control threading with KNOB_MAX_WORKER_THREADS env var). It would certainly be interesting given that the two really target different workloads; fragment shaders are the primary workload used by scientific visualization which llvmpipe is still single threaded for, while swr can parallelize all of the supported shader operations.

                  Comment


                  • #10
                    Originally posted by chuckatkins View Post
                    I believe LLVM is artificially limited to 16 threads; see src/gallium/drivers/llvmpipe/lp_limits.h l64: #define LP_MAX_THREADS 16. You should be able to just change that 16 to 256 and then you can use the LP_NUM_THREADS env var to take it all the way up to 256 threads.
                    Yes but that's what I'm saying - it is intentionally limited to 16 threads because scaling is bad. If you lift the limit it can use more threads but they will just be idle most of the time (even with 16 threads, I bet you see quite lengthy idle periods - there's some design shortcomings, and probably the biggest is that anything pre-rasterization can't run in parallel, and worse it can't run concurrently with fragment shading, thus the threads go idle).
                    I'd also like to see a comparison with SWR, that'd be great (control threading with KNOB_MAX_WORKER_THREADS env var). It would certainly be interesting given that the two really target different workloads; fragment shaders are the primary workload used by scientific visualization which llvmpipe is still single threaded for, while swr can parallelize all of the supported shader operations.
                    It's the opposite. scientific visualization deals with huge data sets, fragment shading is usually simple. llvmpipe does threads only for fs (whereas SWR indeed can scale everything, and should be able to use more threads meaningfully).

                    Comment

                    Working...
                    X