Announcement

Collapse
No announcement yet.

Torvalds Is Unconvinced By LTO'ing A Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Azrael5
    replied
    Originally posted by ryao View Post
    SSE2 instructions are disabled in Linux builds via options sent to the compiler. This was a design decision in Linux to make context switches faster. Using them outside of special critical regions will cause bad things to happen. At best, you will panic the system. At worst, you will have information leaks into userland.

    Anyway, better algorithms always beat better interprocedural optimization. There is no reason why you cannot have both at once, but there is also no reason to rush into it. Quite honestly, I look forward to building the kernel with Clang much more than I look forward to better interprocedural optimizations.

    That being said, I just sent a patch off to the list that illustrates the effect of better algorithms. Reducing time spent in spin locks by relying on lock-free data structures does more for system performance than any amount of interprocedural optimization will ever do:

    https://lkml.org/lkml/2014/4/10/416
    Are you stating that linux operating systems use FPU to process algorithms? If it is real that's a real incredilble joke. FPU is for mahematic operations in floating point. SSE2 it's the way to optimize as well as making reliable and giving boost in operation up to 1000% of major speed. Only this evolution allows linux to outperform any systems currently used in every hardware platform equipped by sse2 processors. I don't believe linux systems lack this feature. It's impossible.

    Leave a comment:


  • ryao
    replied
    Originally posted by Azrael5 View Post
    It uses SSE2 instructions?
    SSE2 instructions are disabled in Linux builds via options sent to the compiler. This was a design decision in Linux to make context switches faster. Using them outside of special critical regions will cause bad things to happen. At best, you will panic the system. At worst, you will have information leaks into userland.

    Anyway, better algorithms always beat better interprocedural optimization. There is no reason why you cannot have both at once, but there is also no reason to rush into it. Quite honestly, I look forward to building the kernel with Clang much more than I look forward to better interprocedural optimizations.

    That being said, I just sent a patch off to the list that illustrates the effect of better algorithms. Reducing time spent in spin locks by relying on lock-free data structures does more for system performance than any amount of interprocedural optimization will ever do:

    Leave a comment:


  • Azrael5
    replied
    Originally posted by cb88 View Post
    From what you say in your fist point there it seems you don't acutally understand LTO. It has nothing to do with what the binary links to after its built... it has to do with being able to do optimisations across all source files that will be linked together to generate a single binary. Without LTO the scope of the optimiser is limited to a single source file when most projects have a crap load of source files. LTO basically loads all the source files into the compiler parses the entire thing and then hands it off to the optmiser rather than doing the aforementioned steps individually for each file to generate individual binaries that are linked to form the entire binary.

    Also have some respect for people that believe different than you. Thanks.
    It uses SSE2 instructions?
    Last edited by Azrael5; 10 April 2014, 10:09 AM.

    Leave a comment:


  • duby229
    replied
    Originally posted by caligula View Post
    I understand that 5% off is important in 4 MB flash storage. However in desktop apps it doesn't matter at all. Hard drives are now 1 TB (ssd) and 4 TB (3.5" hdd). You can also set up raid6 or zfs. So you get tens of terabytes and it's very cheap. You shouldn't bother with binary sizes. In fact there's plenty of room for more functionality in Firefox. Luckily they're working hard at implementing more new features with each release.
    This is wrong in so many ways....

    Just because hdds are so big doesnt mean they should be filled up. The fact is that storage is the slowest part of most computers. You have a choice between waiting for storage or getting your work done.

    Leave a comment:


  • Brane215
    replied
    Originally posted by cb88 View Post
    From what you say in your fist point there it seems you don't acutally understand LTO. It has nothing to do with what the binary links to after its built... it has to do with being able to do optimisations across all source files that will be linked together to generate a single binary. Without LTO the scope of the optimiser is limited to a single source file when most projects have a crap load of source files. LTO basically loads all the source files into the compiler parses the entire thing and then hands it off to the optmiser rather than doing the aforementioned steps individually for each file to generate individual binaries that are linked to form the entire binary.

    Also have some respect for people that believe different than you. Thanks.
    1. Nope. AFAIK you are talking about -fwhole program, that was just meant as an experiment or uverture of LTO.

    Loading all sources and processing them as one is precisely what LTO is trying to avoid, so that you can control the memory burden on the compiling system. LTO works by compiler making instead of classic binary elf and elf with gimple structures, containing functions, variables etc etc with their attributes.

    So later, when linker starts merging those objects, it works with gimple and so it "see" inside those objects just as well as did compiler that made them.
    Then linker redoes optimisation step during the linking, doing much of the same stuff compiler did earlier on each object file, only this time it works with processed material instead of source and it does int globally.

    Also, that "whole program optimisation" as you describe it, wouldn't work across already compiled libraries, since for compiler they'd be just a bunch of externally visible symbols.
    Here, output result can contain besides linkable binary also his corresponding gimple ( = "fat elf" ?), so that compiler can later redo some optimisations on the same library, when compiling another program.
    I don't know or undestand all the details, and can't see how would this work for dynamic linking, so I am assuming this goes just for static linking.

    2. "Also have some respect for people that believe different than you."

    My whole point was that not everyone is the same and that his "don't make a fuss, it works for ME" argument was selfish.

    Leave a comment:


  • cb88
    replied
    Originally posted by Brane215 View Post
    1. kernel, glibc and other non-standard packages are not that insignificant. Kernel doesn't link to anything outside in userland ( except perhaps that one link for getting accurate time, but that's insignificant), but it needs stability more than anything else in userland. And with LTO option being in testing, I don't see how can the result be trusted atm for anything except testing purposes.

    Glibc and similar libraries are not insignificant. If they miscompile with LTO, they'll influence just about anything. And if you compile them non-lto then you are missing big part of the point - big parts will end up being opaque islands to any lto optimisation efforts.

    2. I had to do it blindly, because there simply was no one to turn to. CFLAGS & LDFLAGS I used were given to me IIRC at gentoo forum. Someone said that it is only important to turn on optimisation and that linker only does -O1 anyway. They also said that there is no need to repeat CFLAGS while linking and that this was needed only on early lto version of gcc.

    IOW, there is not much publicly known or accessible documentation on-line about gcc and related tools. Every now and then someone utters something and this gets picked up by ignorant crowd and praised without detailed knowledge about it.
    Had Christianity had so fu**ed up, obscure, hard to get, fragmented and contradictory documentation on "Word of God", whole thing would ended with Stallman, I mean Christ as initiator.
    From what you say in your fist point there it seems you don't acutally understand LTO. It has nothing to do with what the binary links to after its built... it has to do with being able to do optimisations across all source files that will be linked together to generate a single binary. Without LTO the scope of the optimiser is limited to a single source file when most projects have a crap load of source files. LTO basically loads all the source files into the compiler parses the entire thing and then hands it off to the optmiser rather than doing the aforementioned steps individually for each file to generate individual binaries that are linked to form the entire binary.

    Also have some respect for people that believe different than you. Thanks.

    Leave a comment:


  • Brane215
    replied
    Originally posted by hubicka View Post
    For fully standrad compliant software, LTO is fully transparent, and you can just enable -flto and expect improvements. Many of key packages (glibc, kernel, web browsers,...) however do a lot of non-standard things and needs some care to work with LTO. So yes, blind rebuild of your Gentoo with -flto is going to show interesting problems, but it really depends on upstream developers of these packages how quickly they will disappear.

    LTO is new and has issues, this is a chicken-egg problem we need to solve - because LTO now works well in tests and environment, the problems won't be hammered out effectively without feedback. The more LTO users are here, the faster it will mature.
    1. kernel, glibc and other non-standard packages are not that insignificant. Kernel doesn't link to anything outside in userland ( except perhaps that one link for getting accurate time, but that's insignificant), but it needs stability more than anything else in userland. And with LTO option being in testing, I don't see how can the result be trusted atm for anything except testing purposes.

    Glibc and similar libraries are not insignificant. If they miscompile with LTO, they'll influence just about anything. And if you compile them non-lto then you are missing big part of the point - big parts will end up being opaque islands to any lto optimisation efforts.

    2. I had to do it blindly, because there simply was no one to turn to. CFLAGS & LDFLAGS I used were given to me IIRC at gentoo forum. Someone said that it is only important to turn on optimisation and that linker only does -O1 anyway. They also said that there is no need to repeat CFLAGS while linking and that this was needed only on early lto version of gcc.

    IOW, there is not much publicly known or accessible documentation on-line about gcc and related tools. Every now and then someone utters something and this gets picked up by ignorant crowd and praised without detailed knowledge about it.
    Had Christianity had so fu**ed up, obscure, hard to get, fragmented and contradictory documentation on "Word of God", whole thing would ended with Stallman, I mean Christ as initiator.

    Leave a comment:


  • Brane215
    replied
    Originally posted by MWisBest View Post
    To the people saying LTO is "unstable" and "breaks things" etc., I'm compiling Android with all but 2 sections of it having LTO enabled, not a single problem from it. LTO isn't simply "experimental" anymore.
    To all the people that have to avoid sugar and on periodic insuline shots (aka "diabetics") I say that I just downed large Coke over enormous piece of pie. And now I feel just fine.

    Also, I could never understand all this talk about contraception.

    I never wore or used protection and still managed to walk away without getting pregnant. Which would in my case automatically mean caesarian section, since my penis would pose impossible bottleneck.


    Has it ever crosed your mind that we don't run the same harware in comparable circumstances ?

    And BTW, just because you haven't seen any problems _yet_ , it doesn't mean that they are not lurking hidden.
    That "hey, it works!" initial stage was my experience, too. And then problems kept cropping, so after much headscratching I went into "There are problems, but I'm good with workarounds". And then I figured out that I've just put many hours into it with new problems keep popping up and many things not working in consistant ways.
    So, after more than a year, I ended it in "F**k it, it simply aint worth it."

    Give it time.

    Leave a comment:


  • MWisBest
    replied
    To the people saying LTO is "unstable" and "breaks things" etc., I'm compiling Android with all but 2 sections of it having LTO enabled, not a single problem from it. LTO isn't simply "experimental" anymore.

    Leave a comment:


  • Guest
    Guest replied
    Originally posted by caligula View Post
    But how much it really matters? Firefox static initialization takes a while, especially if you have bookmarks and history and some old session. SSD can read disk 600 MB/s and the binary is smaller than 300 MB. To me the real bottleneck is elsewhere.
    But that really isn't the point of this discussion, is it.
    LTO in kernels case probably only really makes sense for release+static builds (since more stuff is compiled in via modules, the slower will LTO compilation be and any significant gains will be reduced).
    Either way, LTO on desktop doesn't probably make much difference, while it could be crucial on embedded (routers, smartphones...).

    [OT]
    This is true, loading firefox profile from harddrive is most time consuming operation (depending on disk speed taking up to 20 seconds vs a second or two needed for warm start with hot page cache).

    This is however true for pretty much any software so it shouldn't really bother us.
    One could make argument that we need to invest resources into filesystem performance, for example btrfs with fast realtime compression algorithms such as lz4 or googles snappy.

    Another fix is similar userspace solution, upx (upx.sourceforge.net)
    [/OT]
    Last edited by Guest; 09 April 2014, 05:50 PM.

    Leave a comment:

Working...
X