Results 1 to 10 of 10

Thread: Gallium3D LLVMpipe Isn't Yet Fit For ARM

  1. #1
    Join Date
    Jan 2007
    Posts
    14,561

    Default Gallium3D LLVMpipe Isn't Yet Fit For ARM

    Phoronix: Gallium3D LLVMpipe Isn't Yet Fit For ARM

    While OpenGL is becoming a requirement for more of the Linux desktops out there, and ARM open-source graphics drivers aren't yet commonplace, using the Gallium3D LLVMpipe software rasterizer on ARM isn't yet a really viable solution...

    http://www.phoronix.com/vr.php?view=MTI0MzA

  2. #2
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,034

    Default

    No numbers? Not even glxgears running on ARM llvmpipe?

  3. #3
    Join Date
    Jan 2008
    Posts
    297

    Default

    I keep seeing these articles about LLVMpipe not working well on ARM and I wonder why anyone expects it should? To my knowledge, no one has done any work on improving LLVM performance on ARM. I can't imagine what you expect to have changed since the last time an article was published about this.

  4. #4
    Join Date
    Feb 2008
    Posts
    52

    Default

    I would figure Apple has a very vested interest in LLVM doing well on ARM. (They're likely holding back stuff, but still, they're moving to LLVM/Clang for iOS...)

    The problem appears to be that there just isn't enough CPU power. If you need a multicore amd64 platform to run LLVMpipe adequetely, there isn't much chance of even an A15 platform doing well. To be fair, doing modern 3D in a regular CPU is hard, look at what happened to Larabee.

  5. #5
    Join Date
    Jan 2008
    Posts
    297

    Default

    Quote Originally Posted by Chad Page View Post
    I would figure Apple has a very vested interest in LLVM doing well on ARM. (They're likely holding back stuff, but still, they're moving to LLVM/Clang for iOS...)
    Indeed. Although I meant that no one has done any performance work on LLVMpipe (not LLVM) for ARM. LLVMpipe is full of code to generate SSE* instructions. There's nothing similar for NEON.

  6. #6
    Join Date
    Nov 2008
    Location
    Madison, WI, USA
    Posts
    864

    Default

    Quote Originally Posted by mattst88 View Post
    Indeed. Although I meant that no one has done any performance work on LLVMpipe (not LLVM) for ARM. LLVMpipe is full of code to generate SSE* instructions. There's nothing similar for NEON.
    I figured it was something similar. I've noticed that most of the Phoronix testing of ARM compilers are done without NEON (excepting a recent article from earlier this week). If LLVMpipe has SSE optimizations and nothing for NEON, then it's no surprise that it's slow on ARM (completely discarding the raw throughput differences of the CPUs in question).

  7. #7
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,034

    Default

    Okay, can someone who knows the internals say why llvmpipe specifically needs knowledge of $INSTRUCTION_SET?

    Isn't that the whole point of using llvm? Having it auto-optimize for the best set available?

  8. #8
    Join Date
    Jan 2008
    Posts
    297

    Default

    Quote Originally Posted by curaga View Post
    Okay, can someone who knows the internals say why llvmpipe specifically needs knowledge of $INSTRUCTION_SET?

    Isn't that the whole point of using llvm? Having it auto-optimize for the best set available?
    One would have thought.

    I think when LLVMpipe was first written, LLVM wasn't able to generate good SSE code given vectorized IR, so the authors worked around this by using intrinsics.

  9. #9
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,034

    Default

    Wheee, hardcoding one set to work around the compiler being bad. And apparently given this news posting, llvm has not improved a lot in 3.1.

  10. #10
    Join Date
    May 2012
    Posts
    25

    Default

    Quote Originally Posted by mattst88 View Post
    One would have thought.

    I think when LLVMpipe was first written, LLVM wasn't able to generate good SSE code given vectorized IR, so the authors worked around this by using intrinsics.
    I wouldn't really call it "worked around". It is true some intrinsics are used because early llvm versions didn't quite work right (for instance the code for comparison/select where up to llvm 3.0 or so backends choked on doing this vectorized).
    However, with llvm IR you also cannot express some things sse (or other vector instruction sets for that matter) can do, and if you try to do it you end up with llvm IR which is too complex for llvm backends to synthesize back into simple cpu instructions.
    Some of these I would blame on llvm (I really hate it for instance llvm doesn't have min/max instructions which are a fairly general concept pretty much all vector extensions can do that, but you'll need to code it as compare/select and last time I checked llvm was unable to fuse that back into a min or max so with only sse2 you'll end up with a cmp instruction plus and/andnot/or (for select) and even if you have sse41 the cmp/select isn't really ideal).
    But most of the intrinsics which are used by llvmpipe don't really fall into that category, they are simply "too weird" to make sense in a generic IR. For instance the pack intrinsics - these are very useful and used extensively, if you fall back on llvm ir if they aren't available it's way more complicated (you can use trunc but the necessary clamping makes it a mess and the generated code terrible).
    Frankly I'm pleasantly surprised it would work at all on arm, I guess the arm backend is in pretty good shape then.
    Nothing is stopping someone from adding support for neon intrinsics however - just recently there's ongoing work for altivec intrinsics on powerpc.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •