Announcement

Collapse
No announcement yet.

LTO'ing Mesa Is Getting Discussed For Performance & Binary Size Reasons

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by hubicka View Post
    Yep, i am gcc developer and even gcc folks what WTF mean. I was just curious what you expect from compiler to do on the jumptable.


    I tried this kind of codegen back in early 2000s when writing x86-64 machine description. At that time it was a loss because the branch prediction logic did not handle well the sequence of indirect jump to direct jump. It also consume more of code cache and less of data cache and data cache is generally less limitting.
    I don't think the main x86 chips changed much in the respect. The expensive part of the tablejump sequence is still the indirect branch.

    Putting the table just behind the instruction is not recommended on Intel nor AMD chips.
    See http://www.intel.com/content/dam/www...ion-manual.pdf section 3.6.9
    I'd say way to go Intel with confusing namings considering ia64 is Itanium

    Comment


    • #42
      Originally posted by hubicka View Post
      Yep, i am gcc developer and even gcc folks what WTF mean. I was just curious what you expect from compiler to do on the jumptable.
      Lol, I wasnt trying to be insulting just curious myself, sorry.
      Originally posted by hubicka View Post
      I tried this kind of codegen back in early 2000s when writing x86-64 machine description. At that time it was a loss because the branch prediction logic did not handle well the sequence of indirect jump to direct jump. It also consume more of code cache and less of data cache and data cache is generally less limitting.
      I don't think the main x86 chips changed much in the respect. The expensive part of the tablejump sequence is still the indirect branch.

      Putting the table just behind the instruction is not recommended on Intel nor AMD chips.
      See http://www.intel.com/content/dam/www...ion-manual.pdf section 3.6.9
      Obviously you dont want to mix code and data since caches are separate, but apparently the big issue is placing it immediatly after the indirect jump. Also I dont know what you mean with "sequence of indirect jump to direct jump", the return from the routine in the example? What happens in the branches is a further topic IMHO, and in this example the indirect jump could be replaced with a table lookup + addition, or you could aswell use arithmetic for the jump-offset (same amount of instructions in each branch).

      In the generic case, what about using constant pools, ie. accumulating the data (jumptable offsets and potentially more) from potentially multiple functions and placing it somewhere close in its own (cacheline-aligned) block thats just data? It was common for the m68k way back in time.

      Comment


      • #43
        Originally posted by haagch View Post
        So with AR, NM etc set it does not look different at all so it's necessary for my setup.

        Oh and now I see the problem I have had some time ago with -fPIC and otherwise similar flags when trying to use wine:
        Code:
        /usr/lib32/xorg/modules/dri/i965_dri.so: undefined symbol: V4F_COUNT
        Of course according to google I am literally the only human on the planet who has posted about this issue... once.

        Time to find out which flag exactly causes it.

        *starts compiling*

        edit: Can confirm, the exact same flags with just -flto removed work with wine.
        Grepping mesa for V4F_COUNT... Wow, that's some low level ASM stuff right there. Probably need some compiler insight to know what's going on there... hubicka maybe?
        I wasn't able to reproduce your V4F_COUNT issue on my machine with mesa-git and GCC 4.9. What kind of configure flags and compiler flags are you using?

        I am assuming the following command has a non-empty output on your machine:

        Code:
        $ nm -D i965_dri.so | grep V4F_COUNT

        Comment


        • #44
          Originally posted by atomsymbol View Post
          I wasn't able to reproduce your V4F_COUNT issue on my machine with mesa-git and GCC 4.9. What kind of configure flags and compiler flags are you using?
          Happens only with wine by the way, everything else seems fine. I also only tested 32 bit wine, not 64 bit.
          Also happens only with intel. radeonsi loads fine.

          Compiled with
          CFLAGS="-O3 -march=native -pipe -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block -Wno-narrowing -flto=8"
          CXXFLAGS="${CFLAGS} -fno-delete-null-pointer-checks -flifetime-dse=1 -fpermissive"
          LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,-flto=8"
          ./autogen.sh \
          --prefix=/usr \
          --libdir=/usr/lib32 \
          --build=x86_64-pc-linux-gnu --host=i686-pc-linux-gnu \
          --with-dri-driverdir=/usr/lib32/xorg/modules/dri \
          --with-dri-drivers=i965 \
          --with-egl-platforms=x11,drm,wayland \
          --with-gallium-drivers=radeonsi,r600,swrast,ilo \
          --enable-glx-tls \
          --enable-egl \
          --enable-gallium-llvm \
          --enable-gles1 \
          --enable-gles2 \
          --enable-texture-float \
          --enable-vdpau \
          --enable-va \
          --enable-gbm \
          --enable-shared-glapi \
          --enable-gallium-osmesa \
          --enable-dri3 \
          --enable-nine \
          --enable-omx \
          --with-vulkan-drivers=

          Originally posted by atomsymbol View Post
          I am assuming the following command has a non-empty output on your machine:

          Code:
          $ nm -D i965_dri.so | grep V4F_COUNT
          One result:
          U V4F_COUNT

          Here's my mesa build:
          http://haagch.frickel.club/files/mesa-lto.pkg.tar.gz
          (compiled with -march=native for ivy bridge)
          Indeed only loading the GL driver from this build with
          LIBGL_DRIVERS_PATH=/wherever/usr/lib32/xorg/modules/dri
          and trying to start warcraft 3 in wine on intel fails with the symbol lookup error (need LIBGL_DEBUG=verbose to see it by the way)

          Comment


          • #45
            Originally posted by haagch View Post
            U V4F_COUNT
            I located the cause of the problem: gen_matypes.c is passed to the compiler in order to generate assembly code in text form (compiler switch: -S), but with -flto GCC doesn't output any code nor data because that is postponed to link time.

            In other words: executing "gcc -S -flto foo.c -o foo.s" generates a file with empty .text and .data sections.

            This isn't a GCC bug. On the other hand, maybe passing -S together with -flto to the compiler should imply -ffat-lto-objects or at least print a warning message (hubicka)? I used GCC 4.9.3 which isn't printing any warning.

            Comment


            • #46
              Cool, thanks. So with -ffat-lto-objects it should work?

              By the way, intel anv vulkan in steam doesn't work with the same LTO settings from earlier either, at least in The Talos Principle.
              Code:
              Cannot set requested display mode 1920x1080: GfxAPI error: Dynamic module "libvulkan.so.1" not found!
              Presumably it's the same problem.

              Comment

              Working...
              X