Originally posted by ryao
View Post
Announcement
Collapse
No announcement yet.
GCC 4.8 Release Brings Improved C++11, Optimizations
Collapse
X
-
-
mplayer for example supports runtime CPU detection even with gcc. It is less optimal than building for a specific CPU but still.
Leave a comment:
-
Not sure about compiler adding it, but I have gotten a 10% increase in throughput by adding select few __builtin_prefetch's in my code manually.
Leave a comment:
-
Originally posted by chithanh View PostNot necessarily, you can build several code paths and switch at runtime between them.
Originally posted by fenrus View Postthere is absolutely room for real performance improvements using compiler flags (see url I posted way earlier in the thread).
prefetch/prefetchw is not part of that however.
With that said, it has been demonstrated that doing tricks with prefetching can improve performance in certain areas:
Leave a comment:
-
Originally posted by chithanh View PostNot necessarily, you can build several code paths and switch at runtime between them.
Also some Ubuntu users have recognized that you can get dramatic performance increases in certain situations by rebuilding specific packages optimized for their CPU with apt-build:
(Gentoo is running in a VM there so the results are not 100% comparable)
prefetch/prefetchw is not part of that however.
Leave a comment:
-
Originally posted by chithanh View PostNot necessarily, you can build several code paths and switch at runtime between them.
Also some Ubuntu users have recognized that you can get dramatic performance increases in certain situations by rebuilding specific packages optimized for their CPU with apt-build:
(Gentoo is running in a VM there so the results are not 100% comparable)
oh compiling with a new enough CPU is a huge gain at times (just see the graphs of my distro that I posted earlier in this thread).
prefetchw or prefetch are not part of that however.
Leave a comment:
-
Originally posted by ryao View Postthe only people that will benefit from it are those building software themselves (i.e. Gentoo users).
Also some Ubuntu users have recognized that you can get dramatic performance increases in certain situations by rebuilding specific packages optimized for their CPU with apt-build:
(Gentoo is running in a VM there so the results are not 100% comparable)
Leave a comment:
-
Originally posted by fenrus View Posteh how?
if you think that the compiler can insert a better prefetchw than the hardware prefetchers.. please speak up with an example...
(disclaimer: I work for Intel on Linux, and also have my own hobby OS that I build in my spare time... I look at compilers and compiler options a lot ;-) )
With that said, I doubt that it is possible for anyone outside of Intel to produce such code for unreleased products without assistance from Intel in the form of either an engineering sample of the chip or extremely accurate emulation software. You likely knew that though.
Leave a comment:
-
Originally posted by ryao View PostIf you set -mtune=<architecture>, then prefetchw might actually be useful. With that said, the only people that will benefit from it are those building software themselves (i.e. Gentoo users).
eh how?
if you think that the compiler can insert a better prefetchw than the hardware prefetchers.. please speak up with an example...
(disclaimer: I work for Intel on Linux, and also have my own hobby OS that I build in my spare time... I look at compilers and compiler options a lot ;-) )
Leave a comment:
-
Originally posted by fenrus View PostThe Prefetch instruction (and in this case "prefetchw") is almost always a loss. Hardware nowadays has pretty aggressive prefetchers that work on the actual access pattern, and those are very effective for most cases.
The problem is that for the cases where it's not (e.g. the ones hard to tell by a machine) are also the ones where the compiler will have a hard time adding their own prefetches. (there are some special cases in HPC and such where the human can know special things)
it's branch prediction hints all over again in many ways, where a broad use is damage because programmers know their own program not as well as the cpu does ;-)
Leave a comment:
Leave a comment: