Announcement

Collapse
No announcement yet.

LLVM May Expand Its Use Of The Loop Vectorizer

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM May Expand Its Use Of The Loop Vectorizer

    Phoronix: LLVM May Expand Its Use Of The Loop Vectorizer

    LLVM's Loop Vectorizer, which is able to automatically vectorize code loops for performance benefits in many scenarios, may find its use expanded for other optimization levels in future LLVM releases...

    http://www.phoronix.com/vr.php?view=MTM4NDk

  • #2
    Wasn't LLVM 3.3 supposed to be released today?

    EDIT: Just looked at their site and saw: "Random wiggle room for further bug fixing and testing" on the schedule.

    Comment


    • #3
      [...] and benchmarking the loop vectorizer showed it to provide performance benefits for many scenarios. [...]
      @Michael: Did you read your article, which you linked, once more? The loop vectorizer decreased the performance in most scenarios!

      The most common case, however, was actually a performance drop when the LLVM auto loop vectorizer was enabled. As mentioned, there isn't yet any cost-model for LLVM to determine when to vectorize a loop or not, plus other performance tuning of this newly-committed code is still needed.
      This is what you wrote in the linked article.

      Comment


      • #4
        does LLVM have well defined rules for what is enabled at various optimisation levels.

        for GCC it is
        -O1: optimisations that don't massively increase compile time
        -O2: O1 + all optimisation that don't increase binary size
        -O3: O2 plus all safe optimisations even if they bloat binary size (though heuristics should stop it bloating to the point of slowing it down)
        -Os: O2 plus optimisations to make the code smaller
        -Ofast: O3 plus some unsafe math options
        -Og: all optimisations that don't effect debugging
        http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

        so auto vectorising ought to go in at O2, unless it involved adding extra code (alignment checks and fallbacks http://locklessinc.com/articles/vectorize/ ) and assuming that its safe. also there should be heuristics so that its only enabled where it speeds stuff up.

        Comment


        • #5
          All common cases of automatic vectorization will increase code-size. It will need a version of the original loop for remainders of the vectorization factor, and will often need versions of the vectorized loops for different detected architecturess, and possibly alignment issues.

          Comment


          • #6
            Go read the cfe-dev list off of clang.llvm.org. You'll get your answers.

            Comment


            • #7
              Originally posted by carewolf View Post
              All common cases of automatic vectorization will increase code-size. It will need a version of the original loop for remainders of the vectorization factor, and will often need versions of the vectorized loops for different detected architecturess, and possibly alignment issues.
              i would have expected (though i may be wrong) a few cases, where vectorisation could shrink the binary size, due to condensing several instructions into one. it would need to be a case where the alignment was fixed and the number of iterations was guaranteed to be a multiple of 4 (or 8 or whatever).

              Comment


              • #8
                Originally posted by carewolf View Post
                All common cases of automatic vectorization will increase code-size. It will need a version of the original loop for remainders of the vectorization factor,
                Or you could fill the remaining space up with 0 for addition and 1 for multiplication.

                Comment


                • #9
                  Originally posted by oleid View Post
                  @Michael: Did you read your article, which you linked, once more? The loop vectorizer decreased the performance in most scenarios!
                  Did YOU read the article?

                  There are two aspects to vectorization (actually to any optimization, but it's most obvious for vectorization):
                  - there is generating optimal code (given the ISA, the targeted µarchitecture, the source code) AND
                  - there is a cost model (this time very dependent on the targeted µarchitecture) which determines whether to use the vectorized code or not.

                  For vectorization the cost model is especially important precisely because it is easy to screw things up and result in slower code --- because there can be a whole lot extra overhead to making the vectorized code work compared to scalar code.

                  The first round of LLVM attempts were specifically targeted at generating optimal code --- this was publicly stated, and that's precisely why vectorization was on by default. There have been attempts now at getting the cost model correct, but no-one's promising it's perfect yet.

                  In other words, yes, blind vectorization can definitely screw things up. But this is not incompetence, nor does it show that vectorization is a foolish idea. It is a reflection of the specific order in which tasks are being performed.

                  Comment

                  Working...
                  X