Announcement

Collapse
No announcement yet.

LTO'ing Mesa Is Getting Discussed For Performance & Binary Size Reasons

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by FireBurn View Post
    I'm using -flto=8 in my CFLAGS and CXXFLAGS and -Wl,-ftlo=8 in my LDFLAGS, seems to be working fine with gcc 6.1

    Had to set:

    Code:
    [FONT=monospace][COLOR=#000000]AR="gcc-ar" [/COLOR]
    NM="gcc-nm"
    RANLIB="gcc-ranlib"[/FONT]
    In my make.conf too so it would work

    Here's my current setup:

    Code:
    [FONT=monospace][COLOR=#000000]CFLAGS="-O3 -march=native -pipe -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block -flto=8 -Wno-narrowing" [/COLOR]
    CXXFLAGS="${CFLAGS} -fno-delete-null-pointer-checks -flifetime-dse=1 -fpermissive"
    LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,-flto=8"[/FONT]
    And here the packages I have to switch some of that off for: (package.env)

    Code:
    [FONT=monospace][COLOR=#000000]app-arch/cpio no-graphite.conf [/COLOR]
    app-arch/tar no-lto.conf no-graphite.conf
    app-emulation/wine no-lto.conf no-graphite.conf
    app-office/libreoffice no-lto.conf
    app-text/convertlit no-lto.conf
    dev-lang/python no-graphite.conf
    dev-libs/libgcrypt no-lto.conf
    dev-qt/designer no-lto.conf
    dev-qt/qtdeclarative no-lto.conf
    dev-qt/qtgui no-lto.conf
    dev-qt/qtscript no-lto.conf
    dev-util/ragel no-lto.conf
    games-action/minetest no-graphite.conf
    games-fps/worldofpadman no-lto.conf
    kde-frameworks/kdoctools no-lto.conf
    media-libs/alsa-lib no-lto.conf
    media-libs/flac no-graphite.conf
    media-libs/freeglut no-graphite.conf
    media-libs/lcms no-graphite.conf
    media-libs/libsndfile bfd.conf
    media-libs/mediastreamer bfd.conf
    media-libs/vulkan-base no-lto.conf
    media-libs/x264 no-lto.conf
    media-sound/pulseaudio no-lto.conf
    media-sound/twolame no-graphite.conf
    media-video/ffmpeg no-graphite.conf
    media-video/ffmpeg o2.conf
    media-video/handbrake no-lto.conf
    sys-apps/gawk no-graphite.conf
    sys-apps/groff no-graphite.conf
    sys-apps/pciutils no-lto.conf
    sys-devel/binutils gold.conf
    sys-devel/gettext no-lto.conf
    sys-fs/fuse bfd.conf no-graphite.conf
    sys-libs/ncurses no-lto.conf
    www-client/chromium o2.conf no-lto.conf no-graphite.conf
    x11-base/xorg-server no-lto.conf
    x11-drivers/xf86-video-intel no-lto.conf[/FONT]
    thx for your settings - will try them. I had to set gcc-ar, etc as well.

    Comment


    • #12
      On the other hand, a JIT compiler is usually expected to do its work really fast. For example my mesa compile with lto takes now 15 minutes with an average of 534% cpu usage. You probably don't want to run your program and have the JIT use several cores for several minutes before switching to the JIT optimized code. So the JIT optimizations will likely be small and local optimizations.
      I wouldn't say it's obvious and that it depends on how much impact LTO really has.

      Comment


      • #13
        Originally posted by atomsymbol

        In my opinion, given a particular programming language, JIT is by definition faster than LTO because it has more bits of information to its disposal. If it isn't faster then there's something wrong with the JIT compiler.
        Nope, regarding source-information LTO sees everything that aint optimized away early (which wont ever help), it usually has more time optimizing and can use better heuristics (optimizing the whole program, thats the primary idea behind it). JIT is Just-in-Time precisely because it cant afford that time for a reason. Hypothetically, if you have a JIT-compiler that beats a "static" one, you can just use that for a static compilation.

        If you are talking about run-time-information, yeah thats a supposedly holy grail, that doesnt seem to be reachable by anyone. To make good deicsions, you need much information, to get information you need time - run-time which already offsets hypothetical benefits. Then there is cache-interference and the inability to easily share the same physical RAM for the code. Just look at the 70MB vs 13MB figure and then consider your JIT will have atleast a 70MB footprint, likely alot more (needs binary for the final code, source for future compilation, code and working RAM).
        Then there is the problem that workload can change, and if you optimise aggressively for an "idle period" your code might be really horrible when a "heavy period" comes around. If you know your workload, you can statically optimize - easily as good or better than JIT, if you dont then JIT can only predict when and how code should be optimized and waste alot time and memory for (mis-)predicting this.

        Theres alot to prove that JIT will ever come close to statically compiled code, let alone beat it.
        Last edited by discordian; 31 May 2016, 04:31 PM.

        Comment


        • #14
          Anyone used pgo for big programs ? According to gcc doc it is easy for a few files. But is there any way to use PGO for MESA ?

          Comment


          • #15
            There's this about AutoFDO from ClearLinux:


            I never tried it though, but maybe the CL devs will comment on its use for mesa.

            Comment


            • #16
              geearf thx for this link. According to pts benchmarks ClearLinux ist really nice. It would be usefull to port the optimizations to other software/distros

              Comment


              • #17
                Originally posted by CochainComplex View Post

                thx for your settings - will try them. I had to set gcc-ar, etc as well.

                Let me know if you need to contents of the env.d conf files too

                Comment


                • #18
                  Originally posted by FireBurn View Post


                  Let me know if you need to contents of the env.d conf files too
                  Works. I applied the mentioned patch and used your flags. No Problems at all. Thx

                  Comment


                  • #19
                    Also interessting in this context and as mentioned in the given dev correspondence : https://download.clearlinux.org/rele.../source/SRPMS/ <- extracting the used flags out of the clear linux source packages.

                    Comment


                    • #20
                      Originally posted by haagch View Post
                      I have tried compiling mesa with lto recently and didn't have any problems - except compilation takes a lot longer. And the worst of it is that for incremental builds, it takes that long every time it links mesa. Is there something gcc can do to cache lto optimizations? Or does it already do that and it's just a high chance that any of the stuff linked together has changed so it needs to relink everything?
                      With -flto=<num of jobs> the LTO times improve. If you have more than 32 threads, you need to bump up --param max-lto-partitions

                      Keeping track of functions which does not change since last optimization is technically possible with GCC'S WHOPR - IPA optimizations are performed without modifying the gimple bodies and theoretically all one needs is to check if the bodies are the same and the IPA optimization decisions match. It needs to be implemented though.

                      Comment

                      Working...
                      X