Announcement

Collapse
No announcement yet.

Torvalds Is Unconvinced By LTO'ing A Linux Kernel

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MWisBest
    replied
    Originally posted by Brane215 View Post
    I've used freshest gcc I could find. Last few gcc bumps in gentoo were mine, since I couldn't wait for an official ebuild. I recompiled everything so many times.

    And even with gcc-4.8.2 and freshest binutils, my list of problematic packages was quite long and not shrinking much.

    Worse yet, what compiled with LTO was not always repeatable. Some packages that compiled initially, all of the sudden would fail to recompile after some time. It had something to do with the sequence in which it was complied, copared to its dependencies, and much of that effect was RECURSIVE.

    In theory, LTO is just great and practically functionally equal to the classic way.

    In reality, compile would often fail at the final link where either:

    - linker would spew out crap that even Google has never seen

    - it would say scary things like that it has XY different definitions for function W

    - that it can't find function W to link to

    - that function W it is seeking for and the one found are not compatible


    And WRT to your proof by numbers of WHOLE 1000 users for 5 months, it's pathetic.

    Look at OpenSSL and heatbleed bug. How many people were using it ? For how many years ?
    Well I suppose the issue is more that LTO isn't something meant for using everywhere on a system where things are updated and changed constantly. For a case like that it's probably best to use it for individual things rather than on big dependencies. With something like Android everything is just compiled all at once and isn't updated in pieces etc.
    For the kernel though (what this thread was concerning) I don't see why there's any reason not to support LTO. Even if the gains aren't extreme, if we turned away every patch that only offered a 3% speed improvement or 4% size reduction we'd have an extremely slow and stagnated kernel in my opinion. Those 3 and 4 percent gains add up pretty fast.

    WRT Proof: http://forum.xda-developers.com/gala...-2014-t2427087
    I don't like bragging or boasting, the only reason I brought up the numbers there was because it was relevant to my case. There are even builds that have accumulated nearly 2000 downloads, however as of late the download numbers have dropped as the amount of time I've had for FML has dropped too, but I don't care, I actually have a good attitude towards it: "If they aren't happy with FML, I would prefer they give other ROMs a shot until they find something they enjoy, and/or give me some feedback as to what I could improve upon in FML."

    The OpenSSL bug was with code, it did pop into my mind though.

    Leave a comment:


  • Brane215
    replied
    Originally posted by MWisBest View Post
    That's a ridiculous comparison...

    You weren't using a recent enough toolchain. I said it isn't experimental ANYMORE. If you're using binutils 2.19 and GCC 4.7, you're not going to have as great of an experience with LTO as you would with binutils 2.23 and GCC 4.8.

    I've been using LTO like this for... jeez, 5 months or so now, with at least a new build every week, used by roughly 1000 other people. No problems have arose from LTO, and if they do after 5 months, it can't be extremely significant and it's not like it's impossible to fix bugs.
    I've used freshest gcc I could find. Last few gcc bumps in gentoo were mine, since I couldn't wait for an official ebuild. I recompiled everything so many times.

    And even with gcc-4.8.2 and freshest binutils, my list of problematic packages was quite long and not shrinking much.

    Worse yet, what compiled with LTO was not always repeatable. Some packages that compiled initially, all of the sudden would fail to recompile after some time. It had something to do with the sequence in which it was complied, copared to its dependencies, and much of that effect was RECURSIVE.

    In theory, LTO is just great and practically functionally equal to the classic way.

    In reality, compile would often fail at the final link where either:

    - linker would spew out crap that even Google has never seen

    - it would say scary things like that it has XY different definitions for function W

    - that it can't find function W to link to

    - that function W it is seeking for and the one found are not compatible


    And WRT to your proof by numbers of WHOLE 1000 users for 5 months, it's pathetic.

    Look at OpenSSL and heatbleed bug. How many people were using it ? For how many years ?

    Leave a comment:


  • MWisBest
    replied
    Originally posted by Brane215 View Post
    To all the people that have to avoid sugar and on periodic insuline shots (aka "diabetics") I say that I just downed large Coke over enormous piece of pie. And now I feel just fine.

    Also, I could never understand all this talk about contraception.

    I never wore or used protection and still managed to walk away without getting pregnant. Which would in my case automatically mean caesarian section, since my penis would pose impossible bottleneck.


    Has it ever crosed your mind that we don't run the same harware in comparable circumstances ?

    And BTW, just because you haven't seen any problems _yet_ , it doesn't mean that they are not lurking hidden.
    That "hey, it works!" initial stage was my experience, too. And then problems kept cropping, so after much headscratching I went into "There are problems, but I'm good with workarounds". And then I figured out that I've just put many hours into it with new problems keep popping up and many things not working in consistant ways.
    So, after more than a year, I ended it in "F**k it, it simply aint worth it."

    Give it time.
    That's a ridiculous comparison...

    You weren't using a recent enough toolchain. I said it isn't experimental ANYMORE. If you're using binutils 2.19 and GCC 4.7, you're not going to have as great of an experience with LTO as you would with binutils 2.23 and GCC 4.8.

    I've been using LTO like this for... jeez, 5 months or so now, with at least a new build every week, used by roughly 1000 other people. No problems have arose from LTO, and if they do after 5 months, it can't be extremely significant and it's not like it's impossible to fix bugs.

    Leave a comment:


  • caligula
    replied
    Originally posted by Garp View Post
    I think this is being missed in all the fuss. This is such a basic and simple step in getting software from 'hobby code' to 'production worthy'. Given the scope of the changes, everything needs to be fully tested, and changes on this scope and potential complexity should be fully justified with compelling arguments for rather than vague and rather hand-wavey "it does stuff quicker/smaller".
    I'd say the LTO is broken if it makes changes that affect program semantics. Maybe C / C++ are bad languages, maybe it's the LTO algorithms. Anyways, these kind of optimizations need to be safe to be useful. Another thing I don't get is, if the LTO support is just about modifying some Makefiles, why not support it partially if it doesn't conflict with non-LTO builds? You could still allow LTO while not actively supporting it.

    Leave a comment:


  • caligula
    replied
    Originally posted by duby229 View Post
    This is wrong in so many ways....

    Just because hdds are so big doesnt mean they should be filled up. The fact is that storage is the slowest part of most computers. You have a choice between waiting for storage or getting your work done.
    I also use Windows 8.1 for gaming and Mac OS X and can sure tell you that the rest of the world doesn't care. Mac apps are huge. Windows games are huge. The binaries are huge. Nothing is optimized well. Windows updater for Java doesn't delete old versions and the C: drive can be filled with 50 versions of Java 5 and 60 versions of Java 6. Nobody cares. On top of that they constantly run antivirus software which slows down the machine.

    Leave a comment:


  • ryao
    replied
    Originally posted by ryao View Post
    You would need to implement it and prove to Linus that it is better to use it. The general consensus is that such a thing is less efficient.

    With that said, all in-kernel SSE is done with interrupts disabled. The registers are saved and restored by the critical section. This is by no means exclusive to Linux. Other kernels do it too. Illumos and FreeBSD come to mind here. The Linux kernel functions for doing this are kernel_fpu_begin()/kernel_fpu_end().
    It turns out this is not strictly correct. Older Linux kernels use the ts bit on x86 to try to avoid reloads as much as possible:

    http://stackoverflow.com/a/431323

    It is still basically what I described earlier though. The kernel simply does not need floating point arithmetic. Sometimes, vector instructions are useful, but in those times, we use critical sections. Anyway, few kernel developers ever use these functions.

    Leave a comment:


  • ryao
    replied
    Originally posted by Brane215 View Post
    So when RAID-4/5/6/etc subsystem uses SSE to calculate Q syndrome, this means that whole that time on that core interrupts are disabled ?

    IIRC x86 has some trick that enables it to save extr<a context only when needed. Something about some flag in special registers so that taks that tries to use SSE faults and then FW code saves registers, marks SSE registers to be saved on next context switch and continues.

    Coludn't such trick work also within kernel ?
    You would need to implement it and prove to Linus that it is better to use it. The general consensus is that such a thing is less efficient.

    With that said, all in-kernel SSE is done with interrupts disabled. The registers are saved and restored by the critical section. This is by no means exclusive to Linux. Other kernels do it too. Illumos and FreeBSD come to mind here. The Linux kernel functions for doing this are kernel_fpu_begin()/kernel_fpu_end().

    Leave a comment:


  • Brane215
    replied
    Originally posted by smitty3268 View Post
    I'm not sure what your question there was, but it seems you didn't understand the initial point. Userspace apps can use SSE all they want, the kernel doesn't control that. The kernel itself just doesn't use SSE.

    So your pdf reader can do whatever it wants, but the File System code built into the kernel better just use integer math only. There are valid performance reasons to enforce that in the kernel, which don't apply to normal userspace code.
    So when RAID-4/5/6/etc subsystem uses SSE to calculate Q syndrome, this means that whole that time on that core interrupts are disabled ?

    IIRC x86 has some trick that enables it to save extr<a context only when needed. Something about some flag in special registers so that taks that tries to use SSE faults and then FW code saves registers, marks SSE registers to be saved on next context switch and continues.

    Coludn't such trick work also within kernel ?

    Leave a comment:


  • smitty3268
    replied
    Originally posted by Azrael5 View Post
    OK take a simple tool as Sumatra (pdf reader), 3.2 version features SSE2 instructions, open pdf file provided by many images with 3.2 and other no-sse2 versions and see the differences scrolling pages.

    Question is that we have hardware also obsolete which is not valued as it deserves.... because of software.
    I'm not sure what your question there was, but it seems you didn't understand the initial point. Userspace apps can use SSE all they want, the kernel doesn't control that. The kernel itself just doesn't use SSE.

    So your pdf reader can do whatever it wants, but the File System code built into the kernel better just use integer math only. There are valid performance reasons to enforce that in the kernel, which don't apply to normal userspace code.
    Last edited by smitty3268; 04-10-2014, 08:07 PM.

    Leave a comment:


  • Azrael5
    replied
    Originally posted by ryao View Post
    The Linux kernel does not use a FPU in general. Everything internally is integer based. The only registers kernel developers are allowed to use in general are the integer registers. x87, MMX, SSE and any other fancy registers are only usable with interrupts disabled and it is only done when there is clear benefit. This is a design decision in the kernel and it works rather well.
    OK take a simple tool as Sumatra (pdf reader), 3.2 version features SSE2 instructions, open pdf file provided by many images with 3.2 and other no-sse2 versions and see the differences scrolling pages.

    Question is that we have hardware also obsolete which is not valued as it deserves.... because of software.
    Last edited by Azrael5; 04-10-2014, 07:13 PM.

    Leave a comment:

Working...
X