I still don't get why distros don't create separate packages for each CPU generation and let the package manager fetch the best one for your hardware
Announcement
Collapse
No announcement yet.
Core i9 7900X vs. Threadripper 1950X On Ubuntu 17.10, Antergos, Clear Linux
Collapse
X
-
Originally posted by FireBurn View PostI still don't get why distros don't create separate packages for each CPU generation and let the package manager fetch the best one for your hardware
Comment
-
Originally posted by geearf View PostLike I don't think the kernel would change a lot based on grayski's benchmarks, but maybe glibc as you mentioned or something else...
Originally posted by FireBurn View PostI still don't get why distros don't create separate packages for each CPU generation and let the package manager fetch the best one for your hardware
Originally posted by sdack View PostThe distros need to wake up and pick up the pace, seeing how Intel with it's Clear Linux is dominating in performance.
- Likes 1
Comment
-
Apple's iOS App Store stores an intermediate representation of applications, and compiles to native (presumably with caching) for the target architecture upon download. That's a lot of applications, although a more limited set of target architectures. Possibly one day, with LLVM can compile everything in the Linux userspace, a Linux distro will provide the same sort of functionality for users.
It seems to me that an enthusiast (recent CPU) Linux user should definitely be looking at Clearlinux, or even Gentoo, to get the best performance, and there can be significant gains.
Comment
-
Originally posted by chithanh View PostMy guess is that most of Clear Linux performance advantage comes from dropping support for older hardware, plus a couple of performance related patches and customizations, e.g. in glibc. The former is not really an option for most distros (dropping support for all hardware prior to Haswell/Westmere? The most recent Debian release just got rid of everything prior to Pentium Pro...). The newer glibc version will find its way into distros in due time.
Take Debian for example. It did actually perform quite well in the last comparison and was in second place behind Clear Linux. Then take a look at the work that's been done by the Debian project. They maintain a stable, a testing and an unstable distribution, provide ports for i386 (32-bit), amd64 (64-bit), 3 flavours of ARM (arm64, armel, armhf), 3 flavours of MIPS (mips, mipsel, mips64el), PowerPC and System Z. Not to mention further ports that are in progress such as Sparc and Motorola 68k.
I don't think it's impossible at all. It's more about picking up the pace and getting started than having to surpass limitations.
Comment
-
Originally posted by chithanh View PostMy guess is that most of Clear Linux performance advantage comes from dropping support for older hardware, plus a couple of performance related patches and customizations, e.g. in glibc. The former is not really an option for most distros (dropping support for all hardware prior to Haswell/Westmere? The most recent Debian release just got rid of everything prior to Pentium Pro...). The newer glibc version will find its way into distros in due time.
1.) Latest GCC, Latest glibc.
2.) Careful fine tuning of the compiling flags(and patches in some cases) that use the newer blocks in GCC like LTO, Graphite, Auto vectorization, Loop optimizations(like -fsplit-loops), etc. etc.
3.) AutoFDO for profiling <-- Big perf gains but is a lot harder to compile for mere mortals that is why is not that popular on most distros
Comment
-
Originally posted by geearf View Post
Do you guys have an idea of which packages matter the most in terms of compilation optimizations?
Maybe it's the whole OS, but I'm wondering if one could not recompile certain key packages to get about the same performance gain without having to go all source like gentoo.
(On, say, Skylake cpus like this core i9, a float point add takes 4 cycles, a floating point mul takes 4 cycles, but a FMA (multiple and add) ALSO takes 4 cycles. this means that code that does lots of multiply and adds on FP can get significant benefit)
- Likes 1
Comment
-
Originally posted by sdack View PostI can understand the reasons for why this is, but it doesn't explain why it has to stay like this.
Take Debian for example. It did actually perform quite well in the last comparison and was in second place behind Clear Linux. Then take a look at the work that's been done by the Debian project. They maintain a stable, a testing and an unstable distribution, provide ports for i386 (32-bit), amd64 (64-bit), 3 flavours of ARM (arm64, armel, armhf), 3 flavours of MIPS (mips, mipsel, mips64el), PowerPC and System Z. Not to mention further ports that are in progress such as Sparc and Motorola 68k.
I don't think it's impossible at all. It's more about picking up the pace and getting started than having to surpass limitations.
The main difference is compilation process and how much runtime debug apparatus have the kernel enabled(frame pointers, kprobes, etc.), I think Fedora enable lots debug features in their default kernel but I'm not sure.
In the cases you see those huge massive spikes between distros is mostly due to "Too complex LOOPS" that simply cannot be optimized by certain compiler version specially at default settings, this means if a newer version of the compiler with certain extra flags find a way to break those loops and optimize them(vectorize them, parallelize them, etc) the gain will be huge compared to the un optimized one that is basically worst case scenario result
Comment
-
Originally posted by arjan_intel View Post
It's mostly the math-heavy things (libm from glibc, the BLAS library of your choice, etc) where there is a real split in performance between generations. The AVX2+FMA line is a split where performance fundamentally changes.
(On, say, Skylake cpus like this core i9, a float point add takes 4 cycles, a floating point mul takes 4 cycles, but a FMA (multiple and add) ALSO takes 4 cycles. this means that code that does lots of multiply and adds on FP can get significant benefit)
Btw, you guys have tested how close is latest GCC(with your current compilation optimizations) to ICC(where possible)?
Comment
Comment