Announcement

Collapse
No announcement yet.

LTO'ing Mesa Is Getting Discussed For Performance & Binary Size Reasons

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    There's this about AutoFDO from ClearLinux:
    https://clearlinux.org/features/autofdo

    I never tried it though, but maybe the CL devs will comment on its use for mesa.

    Comment


    • #22
      geearf thx for this link. According to pts benchmarks ClearLinux ist really nice. It would be usefull to port the optimizations to other software/distros

      Comment


      • #23
        Originally posted by CochainComplex View Post

        thx for your settings - will try them. I had to set gcc-ar, etc as well.

        Let me know if you need to contents of the env.d conf files too

        Comment


        • #24
          Originally posted by FireBurn View Post


          Let me know if you need to contents of the env.d conf files too
          Works. I applied the mentioned patch and used your flags. No Problems at all. Thx

          Comment


          • #25
            Also interessting in this context and as mentioned in the given dev correspondence : https://download.clearlinux.org/rele.../source/SRPMS/ <- extracting the used flags out of the clear linux source packages.

            Comment


            • #26
              Originally posted by discordian View Post
              That could hypothetically be done already, but thats the linking phase, not re-compilation (even if technically wit LTO the linking phase will compile too).

              If fixed-position code woud be faster everytime then that would be used with-or-without LTO (unless PIC is needed/requested for technical reasons). Its generally wrong though, depends highly on architecture (if it allows PC-relative adressing ) and OS/Toolchain.
              In my opinion, we should slowly be moving to a world where -fPIC passed to a compiler is ignored as a command-line option because the compiler (and the "system" behind) will automatically decide when to use position-independent code.

              As far as I know we aren't moving to such a world at all, which is unfortunate.

              ----

              Example C code compiled for amd64:

              Code:
              int fn(int a, int b) {
                  switch(a) {
                      case 0: return b+2;
                      case 1: return b+3;
                      case 2: return b+5;
                      case 3: return b+7;
                      case 4: return b+11;
                      case 5: return b+13;
                  }
                  return -1;
              }
              gcc -O3 -fPIC:

              Code:
              fn:
                     cmpl    $5, %edi
                     movl    $-1, %eax
                     ja      .L2
                     leaq    .L4(%rip), %rax
                     movl    %edi, %edi
                     movslq  (%rax,%rdi,4), %rdx
                     addq    %rdx, %rax
                     jmp     *%rax
              gcc -O3:

              Code:
              fn:
                     cmpl    $5, %edi
                     movl    $-1, %eax
                     ja      .L2
                     movl    %edi, %edi
                     jmp     *.L4(,%rdi,8)

              Comment


              • #27
                Originally posted by haagch View Post
                I have tried compiling mesa with lto recently and didn't have any problems - except compilation takes a lot longer. And the worst of it is that for incremental builds, it takes that long every time it links mesa. Is there something gcc can do to cache lto optimizations? Or does it already do that and it's just a high chance that any of the stuff linked together has changed so it needs to relink everything?
                With -flto=<num of jobs> the LTO times improve. If you have more than 32 threads, you need to bump up --param max-lto-partitions

                Keeping track of functions which does not change since last optimization is technically possible with GCC'S WHOPR - IPA optimizations are performed without modifying the gimple bodies and theoretically all one needs is to check if the bodies are the same and the IPA optimization decisions match. It needs to be implemented though.

                Comment


                • #28
                  Originally posted by haagch View Post
                  I also learned about other flags from https://lists.freedesktop.org/archiv...ay/118929.html so I'm now compiling with

                  export CFLAGS="$CFLAGS -O3 -flto=9 -ffat-lto-objects -flto-odr-type-merging"
                  export CXXFLAGS="$CFLAGS"
                  export LDFLAGS=" -flto=9"
                  With linker plugin, -ffat-lto-objects should no longer be needed. It only makes GCC to proudce assembly at compile time that doubles compile times. It is also possible that during linking the plugin is not used (because build machinery bypasses gcc-ar or gcc or gcc-nm somehow and calls binutils directly) and then the LTO is silently discarded and you won't see any LTO options. So try to get your build working without -ffat-lto-objects.

                  -flto-odr-type-merging is the default, so no need to specify it.

                  Comment


                  • #29
                    Originally posted by CochainComplex View Post
                    Anyone used pgo for big programs ? According to gcc doc it is easy for a few files. But is there any way to use PGO for MESA ?
                    It is used byt Firefox and google, so yes, there are big programs compiled with PGO.

                    Comment


                    • #30
                      Originally posted by atomsymbol View Post

                      In my opinion, we should slowly be moving to a world where -fPIC passed to a compiler is ignored as a command-line option because the compiler (and the "system" behind) will automatically decide when to use position-independent code.

                      As far as I know we aren't moving to such a world at all, which is unfortunate.
                      GCC 6+ uses linker plugin to drop -fPIC at linktime when the resulting binary is not PIC. -fPIC limits optimizations compiler can do (because with PIC you can dynamically interpose symbols) and thus it needs to be know when the code is being optimized. not sure you would move away from specifying it explicitly to compiler/linker.

                      Comment

                      Working...
                      X