Announcement

Collapse
No announcement yet.

LTO'ing Mesa Is Getting Discussed For Performance & Binary Size Reasons

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by atomsymbol View Post

    For example: Conversion of position-independent code to fixed-position code when the JIT determines the code is used very often. The latter code is slightly faster than the former.
    That could hypothetically be done already, but thats the linking phase, not re-compilation (even if technically wit LTO the linking phase will compile too).

    If fixed-position code woud be faster everytime then that would be used with-or-without LTO (unless PIC is needed/requested for technical reasons). Its generally wrong though, depends highly on architecture (if it allows PC-relative adressing ) and OS/Toolchain.

    Comment


    • #12
      I'm using -flto=8 in my CFLAGS and CXXFLAGS and -Wl,-ftlo=8 in my LDFLAGS, seems to be working fine with gcc 6.1

      Had to set:

      Code:
      AR="gcc-ar" 
      NM="gcc-nm"
      RANLIB="gcc-ranlib"
      In my make.conf too so it would work

      Here's my current setup:

      Code:
      CFLAGS="-O3 -march=native -pipe -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block -flto=8 -Wno-narrowing" 
      CXXFLAGS="${CFLAGS} -fno-delete-null-pointer-checks -flifetime-dse=1 -fpermissive"
      LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,-flto=8"
      And here the packages I have to switch some of that off for: (package.env)

      Code:
      app-arch/cpio                           no-graphite.conf 
      app-arch/tar                            no-lto.conf no-graphite.conf
      app-emulation/wine                      no-lto.conf no-graphite.conf
      app-office/libreoffice                  no-lto.conf
      app-text/convertlit                     no-lto.conf
      dev-lang/python                         no-graphite.conf
      dev-libs/libgcrypt                      no-lto.conf
      dev-qt/designer                         no-lto.conf
      dev-qt/qtdeclarative                    no-lto.conf
      dev-qt/qtgui                            no-lto.conf
      dev-qt/qtscript                         no-lto.conf
      dev-util/ragel                          no-lto.conf
      games-action/minetest                   no-graphite.conf
      games-fps/worldofpadman                 no-lto.conf
      kde-frameworks/kdoctools                no-lto.conf
      media-libs/alsa-lib                     no-lto.conf
      media-libs/flac                         no-graphite.conf
      media-libs/freeglut                     no-graphite.conf
      media-libs/lcms                         no-graphite.conf
      media-libs/libsndfile                   bfd.conf
      media-libs/mediastreamer                bfd.conf
      media-libs/vulkan-base                  no-lto.conf
      media-libs/x264                         no-lto.conf
      media-sound/pulseaudio                  no-lto.conf
      media-sound/twolame                     no-graphite.conf
      media-video/ffmpeg                      no-graphite.conf
      media-video/ffmpeg                      o2.conf
      media-video/handbrake                   no-lto.conf
      sys-apps/gawk                           no-graphite.conf
      sys-apps/groff                          no-graphite.conf
      sys-apps/pciutils                       no-lto.conf
      sys-devel/binutils                      gold.conf
      sys-devel/gettext                       no-lto.conf
      sys-fs/fuse                             bfd.conf no-graphite.conf
      sys-libs/ncurses                        no-lto.conf
      www-client/chromium                     o2.conf no-lto.conf no-graphite.conf
      x11-base/xorg-server                    no-lto.conf
      x11-drivers/xf86-video-intel            no-lto.conf

      Comment


      • #13
        Originally posted by discordian View Post
        That could hypothetically be done already, but thats the linking phase, not re-compilation (even if technically wit LTO the linking phase will compile too).

        If fixed-position code woud be faster everytime then that would be used with-or-without LTO (unless PIC is needed/requested for technical reasons). Its generally wrong though, depends highly on architecture (if it allows PC-relative adressing ) and OS/Toolchain.
        Ok, but LTO != JIT

        Comment


        • #14
          Originally posted by atomsymbol View Post
          Are you using -flto=$(nproc) ?
          I did not know this was an option, thanks.

          I also learned about other flags from https://lists.freedesktop.org/archiv...ay/118929.html so I'm now compiling with

          export CFLAGS="$CFLAGS -O3 -flto=9 -ffat-lto-objects -flto-odr-type-merging"
          export CXXFLAGS="$CFLAGS"
          export LDFLAGS=" -flto=9"

          Now the lto1 binary indeed uses all my cores. I still do not have any problems with mapi with these flags. My mesa installation is 9.24 MiB bigger than my previous build with -O2 and no lto. The message says that removing --enable-glx-tls also helps against his build failure, but I have it enabled and still no problems. Strange.

          Anyway, the build took about 15 minutes for radeonsi,r600,swrast,ilo, anv, and most features enabled. Not sure how long it is without lto, but I guess somewhere between 5-10 minutes.

          Comment


          • #15
            Originally posted by atomsymbol View Post

            Ok, but LTO != JIT
            Thankfully, yes.
            Just use Java if you want JIT, likely wont ever be as fast as compiled code (unless you carefully construct a problem for this "solution"), and measurements and heuristics that would be used to optimize the running code can just aswell just result in degradation of performance.

            Comment


            • #16
              Originally posted by FireBurn View Post
              I'm using -flto=8 in my CFLAGS and CXXFLAGS and -Wl,-ftlo=8 in my LDFLAGS, seems to be working fine with gcc 6.1

              Had to set:

              Code:
              AR="gcc-ar" 
              NM="gcc-nm"
              RANLIB="gcc-ranlib"
              In my make.conf too so it would work

              Here's my current setup:

              Code:
              CFLAGS="-O3 -march=native -pipe -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block -flto=8 -Wno-narrowing" 
              CXXFLAGS="${CFLAGS} -fno-delete-null-pointer-checks -flifetime-dse=1 -fpermissive"
              LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,-flto=8"
              And here the packages I have to switch some of that off for: (package.env)

              Code:
              app-arch/cpio no-graphite.conf 
              app-arch/tar no-lto.conf no-graphite.conf
              app-emulation/wine no-lto.conf no-graphite.conf
              app-office/libreoffice no-lto.conf
              app-text/convertlit no-lto.conf
              dev-lang/python no-graphite.conf
              dev-libs/libgcrypt no-lto.conf
              dev-qt/designer no-lto.conf
              dev-qt/qtdeclarative no-lto.conf
              dev-qt/qtgui no-lto.conf
              dev-qt/qtscript no-lto.conf
              dev-util/ragel no-lto.conf
              games-action/minetest no-graphite.conf
              games-fps/worldofpadman no-lto.conf
              kde-frameworks/kdoctools no-lto.conf
              media-libs/alsa-lib no-lto.conf
              media-libs/flac no-graphite.conf
              media-libs/freeglut no-graphite.conf
              media-libs/lcms no-graphite.conf
              media-libs/libsndfile bfd.conf
              media-libs/mediastreamer bfd.conf
              media-libs/vulkan-base no-lto.conf
              media-libs/x264 no-lto.conf
              media-sound/pulseaudio no-lto.conf
              media-sound/twolame no-graphite.conf
              media-video/ffmpeg no-graphite.conf
              media-video/ffmpeg o2.conf
              media-video/handbrake no-lto.conf
              sys-apps/gawk no-graphite.conf
              sys-apps/groff no-graphite.conf
              sys-apps/pciutils no-lto.conf
              sys-devel/binutils gold.conf
              sys-devel/gettext no-lto.conf
              sys-fs/fuse bfd.conf no-graphite.conf
              sys-libs/ncurses no-lto.conf
              www-client/chromium o2.conf no-lto.conf no-graphite.conf
              x11-base/xorg-server no-lto.conf
              x11-drivers/xf86-video-intel no-lto.conf
              thx for your settings - will try them. I had to set gcc-ar, etc as well.

              Comment


              • #17
                Originally posted by discordian View Post
                Thankfully, yes.
                Just use Java if you want JIT, likely wont ever be as fast as compiled code (unless you carefully construct a problem for this "solution"), and measurements and heuristics that would be used to optimize the running code can just aswell just result in degradation of performance.
                In my opinion, given a particular programming language, JIT is by definition faster than LTO because it has more bits of information to its disposal. If it isn't faster then there's something wrong with the JIT compiler.

                Comment


                • #18
                  On the other hand, a JIT compiler is usually expected to do its work really fast. For example my mesa compile with lto takes now 15 minutes with an average of 534% cpu usage. You probably don't want to run your program and have the JIT use several cores for several minutes before switching to the JIT optimized code. So the JIT optimizations will likely be small and local optimizations.
                  I wouldn't say it's obvious and that it depends on how much impact LTO really has.

                  Comment


                  • #19
                    Originally posted by atomsymbol View Post

                    In my opinion, given a particular programming language, JIT is by definition faster than LTO because it has more bits of information to its disposal. If it isn't faster then there's something wrong with the JIT compiler.
                    Nope, regarding source-information LTO sees everything that aint optimized away early (which wont ever help), it usually has more time optimizing and can use better heuristics (optimizing the whole program, thats the primary idea behind it). JIT is Just-in-Time precisely because it cant afford that time for a reason. Hypothetically, if you have a JIT-compiler that beats a "static" one, you can just use that for a static compilation.

                    If you are talking about run-time-information, yeah thats a supposedly holy grail, that doesnt seem to be reachable by anyone. To make good deicsions, you need much information, to get information you need time - run-time which already offsets hypothetical benefits. Then there is cache-interference and the inability to easily share the same physical RAM for the code. Just look at the 70MB vs 13MB figure and then consider your JIT will have atleast a 70MB footprint, likely alot more (needs binary for the final code, source for future compilation, code and working RAM).
                    Then there is the problem that workload can change, and if you optimise aggressively for an "idle period" your code might be really horrible when a "heavy period" comes around. If you know your workload, you can statically optimize - easily as good or better than JIT, if you dont then JIT can only predict when and how code should be optimized and waste alot time and memory for (mis-)predicting this.

                    Theres alot to prove that JIT will ever come close to statically compiled code, let alone beat it.
                    Last edited by discordian; 31 May 2016, 04:31 PM.

                    Comment


                    • #20
                      Anyone used pgo for big programs ? According to gcc doc it is easy for a few files. But is there any way to use PGO for MESA ?

                      Comment

                      Working...
                      X