Torvalds Is Unconvinced By LTO'ing A Linux Kernel

Brane215 replied

11 April 2014, 01:54 AM
Originally posted by MWisBest View Post

That's a ridiculous comparison...

You weren't using a recent enough toolchain. I said it isn't experimental ANYMORE. If you're using binutils 2.19 and GCC 4.7, you're not going to have as great of an experience with LTO as you would with binutils 2.23 and GCC 4.8.

I've been using LTO like this for... jeez, 5 months or so now, with at least a new build every week, used by roughly 1000 other people. No problems have arose from LTO, and if they do after 5 months, it can't be extremely significant and it's not like it's impossible to fix bugs.

I've used freshest gcc I could find. Last few gcc bumps in gentoo were mine, since I couldn't wait for an official ebuild. I recompiled everything so many times.

And even with gcc-4.8.2 and freshest binutils, my list of problematic packages was quite long and not shrinking much.

Worse yet, what compiled with LTO was not always repeatable. Some packages that compiled initially, all of the sudden would fail to recompile after some time. It had something to do with the sequence in which it was complied, copared to its dependencies, and much of that effect was RECURSIVE.

In theory, LTO is just great and practically functionally equal to the classic way.

In reality, compile would often fail at the final link where either:

- linker would spew out crap that even Google has never seen

- it would say scary things like that it has XY different definitions for function W

- that it can't find function W to link to

- that function W it is seeking for and the one found are not compatible

And WRT to your proof by numbers of WHOLE 1000 users for 5 months, it's pathetic.

Look at OpenSSL and heatbleed bug. How many people were using it ? For how many years ?
Leave a comment:
MWisBest replied

11 April 2014, 01:08 AM
Originally posted by Brane215 View Post

To all the people that have to avoid sugar and on periodic insuline shots (aka "diabetics") I say that I just downed large Coke over enormous piece of pie. And now I feel just fine.

Also, I could never understand all this talk about contraception.

I never wore or used protection and still managed to walk away without getting pregnant. Which would in my case automatically mean caesarian section, since my penis would pose impossible bottleneck.

Has it ever crosed your mind that we don't run the same harware in comparable circumstances ?

And BTW, just because you haven't seen any problems _yet_ , it doesn't mean that they are not lurking hidden.
That "hey, it works!" initial stage was my experience, too. And then problems kept cropping, so after much headscratching I went into "There are problems, but I'm good with workarounds". And then I figured out that I've just put many hours into it with new problems keep popping up and many things not working in consistant ways.
So, after more than a year, I ended it in "F**k it, it simply aint worth it."

Give it time.

That's a ridiculous comparison...

You weren't using a recent enough toolchain. I said it isn't experimental ANYMORE. If you're using binutils 2.19 and GCC 4.7, you're not going to have as great of an experience with LTO as you would with binutils 2.23 and GCC 4.8.

I've been using LTO like this for... jeez, 5 months or so now, with at least a new build every week, used by roughly 1000 other people. No problems have arose from LTO, and if they do after 5 months, it can't be extremely significant and it's not like it's impossible to fix bugs.
Leave a comment:
caligula replied

10 April 2014, 09:37 PM
Originally posted by Garp View Post

I think this is being missed in all the fuss. This is such a basic and simple step in getting software from 'hobby code' to 'production worthy'. Given the scope of the changes, everything needs to be fully tested, and changes on this scope and potential complexity should be fully justified with compelling arguments for rather than vague and rather hand-wavey "it does stuff quicker/smaller".

I'd say the LTO is broken if it makes changes that affect program semantics. Maybe C / C++ are bad languages, maybe it's the LTO algorithms. Anyways, these kind of optimizations need to be safe to be useful. Another thing I don't get is, if the LTO support is just about modifying some Makefiles, why not support it partially if it doesn't conflict with non-LTO builds? You could still allow LTO while not actively supporting it.
Leave a comment:
caligula replied

10 April 2014, 09:32 PM
Originally posted by duby229 View Post

This is wrong in so many ways....

Just because hdds are so big doesnt mean they should be filled up. The fact is that storage is the slowest part of most computers. You have a choice between waiting for storage or getting your work done.

I also use Windows 8.1 for gaming and Mac OS X and can sure tell you that the rest of the world doesn't care. Mac apps are huge. Windows games are huge. The binaries are huge. Nothing is optimized well. Windows updater for Java doesn't delete old versions and the C: drive can be filled with 50 versions of Java 5 and 60 versions of Java 6. Nobody cares. On top of that they constantly run antivirus software which slows down the machine.
Leave a comment:
ryao replied

10 April 2014, 08:37 PM
Originally posted by ryao View Post

You would need to implement it and prove to Linus that it is better to use it. The general consensus is that such a thing is less efficient.

With that said, all in-kernel SSE is done with interrupts disabled. The registers are saved and restored by the critical section. This is by no means exclusive to Linux. Other kernels do it too. Illumos and FreeBSD come to mind here. The Linux kernel functions for doing this are kernel_fpu_begin()/kernel_fpu_end().

It turns out this is not strictly correct. Older Linux kernels use the ts bit on x86 to try to avoid reloads as much as possible:

What are coding conventions for using floating-point in Linux device drivers?

http://stackoverflow.com/a/431323

This is related to this question. I'm not an expert on Linux device drivers or kernel modules, but I've been reading "Linux Device Drivers" [O'Reilly] by Rubini & Corbet and a number of online

It is still basically what I described earlier though. The kernel simply does not need floating point arithmetic. Sometimes, vector instructions are useful, but in those times, we use critical sections. Anyway, few kernel developers ever use these functions.
Leave a comment:
ryao replied

10 April 2014, 08:30 PM
Originally posted by Brane215 View Post

So when RAID-4/5/6/etc subsystem uses SSE to calculate Q syndrome, this means that whole that time on that core interrupts are disabled ?

IIRC x86 has some trick that enables it to save extr<a context only when needed. Something about some flag in special registers so that taks that tries to use SSE faults and then FW code saves registers, marks SSE registers to be saved on next context switch and continues.

Coludn't such trick work also within kernel ?

You would need to implement it and prove to Linus that it is better to use it. The general consensus is that such a thing is less efficient.

With that said, all in-kernel SSE is done with interrupts disabled. The registers are saved and restored by the critical section. This is by no means exclusive to Linux. Other kernels do it too. Illumos and FreeBSD come to mind here. The Linux kernel functions for doing this are kernel_fpu_begin()/kernel_fpu_end().
Leave a comment:
Brane215 replied

10 April 2014, 08:18 PM
Originally posted by smitty3268 View Post

I'm not sure what your question there was, but it seems you didn't understand the initial point. Userspace apps can use SSE all they want, the kernel doesn't control that. The kernel itself just doesn't use SSE.

So your pdf reader can do whatever it wants, but the File System code built into the kernel better just use integer math only. There are valid performance reasons to enforce that in the kernel, which don't apply to normal userspace code.

So when RAID-4/5/6/etc subsystem uses SSE to calculate Q syndrome, this means that whole that time on that core interrupts are disabled ?

IIRC x86 has some trick that enables it to save extr<a context only when needed. Something about some flag in special registers so that taks that tries to use SSE faults and then FW code saves registers, marks SSE registers to be saved on next context switch and continues.

Coludn't such trick work also within kernel ?
Leave a comment:
smitty3268 replied

10 April 2014, 08:03 PM
Originally posted by Azrael5 View Post

OK take a simple tool as Sumatra (pdf reader), 3.2 version features SSE2 instructions, open pdf file provided by many images with 3.2 and other no-sse2 versions and see the differences scrolling pages.

Question is that we have hardware also obsolete which is not valued as it deserves.... because of software.

I'm not sure what your question there was, but it seems you didn't understand the initial point. Userspace apps can use SSE all they want, the kernel doesn't control that. The kernel itself just doesn't use SSE.

So your pdf reader can do whatever it wants, but the File System code built into the kernel better just use integer math only. There are valid performance reasons to enforce that in the kernel, which don't apply to normal userspace code.

Last edited by smitty3268; 10 April 2014, 08:07 PM.
Leave a comment:
Azrael5 replied

10 April 2014, 07:10 PM
Originally posted by ryao View Post

The Linux kernel does not use a FPU in general. Everything internally is integer based. The only registers kernel developers are allowed to use in general are the integer registers. x87, MMX, SSE and any other fancy registers are only usable with interrupts disabled and it is only done when there is clear benefit. This is a design decision in the kernel and it works rather well.

OK take a simple tool as Sumatra (pdf reader), 3.2 version features SSE2 instructions, open pdf file provided by many images with 3.2 and other no-sse2 versions and see the differences scrolling pages.

Question is that we have hardware also obsolete which is not valued as it deserves.... because of software.

Last edited by Azrael5; 10 April 2014, 07:13 PM.
Leave a comment:
ryao replied

10 April 2014, 06:53 PM
Originally posted by Azrael5 View Post

Are you stating that linux operating systems use FPU to process algorithms? If it is real that's a real incredilble joke. FPU is for mahematic operations in floating point. SSE2 it's the way to optimize as well as making reliable and giving boost in operation up to 1000% of major speed. Only this evolution allows linux to outperform any systems currently used in every hardware platform equipped by sse2 processors. I don't believe linux systems lack this feature. It's impossible.

The Linux kernel does not use a FPU in general. Everything internally is integer based. The only registers kernel developers are allowed to use in general are the integer registers. x87, MMX, SSE and any other fancy registers are only usable with interrupts disabled and it is only done when there is clear benefit. This is a design decision in the kernel and it works rather well.

Last edited by ryao; 10 April 2014, 06:57 PM.
Leave a comment:

Announcement

Torvalds Is Unconvinced By LTO'ing A Linux Kernel

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: