Announcement

Collapse
No announcement yet.

GCC To Receive Automatic Parallelization Support

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Overly generalised. Just about all gentoo-ers know that you should not over-optimize.

    I use both Gentoo and Ubuntu, and for my dev-stations I always go for gentoo, since I don't get that library hell that is any other distribution...

    Comment


    • #17
      How parallel

      Does "Parallel" means code generation that utilize multiple threads (TFA talks about "better performance on multi-core systems", so I assume multiple threads must be involved here) or better detecting loops that can be computed in parallel using SIMD instructions?

      If first case is true, does compiler care about cost of creating/killing threads on specific OS?

      Comment


      • #18
        Also, as a side note, it has been mentioned many times that Gentoo is not just for ricers.
        +1

        a distribution that allows you to dynamically alter package dependencies (while resolving dependencies) according to your needs is not something you come across everyday. its package manager also transparently handles binary, from-source or development snapshot (from git,svn,hg,darcs,cvs, etc) packages in the same manner.

        Comment


        • #19
          Originally posted by smitty3268 View Post
          march=core2 won't include anything past SSE2 by default. You have to add -mssse3, -mssse4.1, and -mssse4.2 to get the special instructions that won't work on your A64. I think the standard instructions on those chips are identical, unless you get into stuff like special hardware VM support.

          Really, SSE2+ support is only going to affect a few applications anyway, and they've probably got special flags and optimizations setup within the ebuild or program. I'm pretty sure MPlayer, for example, contains lots of manual assembly code and detects what CPU you have at runtime and picks it's fastest paths available.
          That was precisely the thing I was counting on, that by and large march=core2 implies generic x86_64 along with some sse stuff (just like march=pentium4 is apparently equivalent to march=i686 plus mmmx, msse, msse2). What I was unsure was to what extent. My athlon64 has upto sse3 (pni) and afaik, all core2 chips have ssse3 (at least my merom t5270 and penryn p8700 have). Anyway, looks like I am good . Although what bothers me is that if these optimizations play around with cache workings and if it affects performance by a lot. (For example, when choosing processor-family in kernel config, between generic x86-64 and core2, some of the parameters changed are CONFIG_X86_L1_CACHE_BYTES, CONFIG_X86_INTERNODE_CACHE_BYTES, CONFIG_X86_L1_CACHE_SHIFT. Interestingly, changing between core2 and k8, the only difference in the config is CONFIG_X86_P6_NOP.)

          Comment


          • #20
            Originally posted by FunkyRider View Post
            What happens when a Gentoo user wants to change his CPU+Motherboard?
            I have switched from a P4 to a Core2Quad and I setup both boxes with Gentoo.
            The only tricky thing is to find the correct options for the kernel. Compiling a kernel with only the options you need is always a nightmare.
            The rest of installation is really easy, and, with a Core2Quad, really fast...

            I hope seeing soon the GCC4.4 on Gentoo, which is one of the rare distro to be able to handle that optimization, as other generic distro are compiled for generic i586...

            Comment


            • #21
              Another Gentoo user here. I have very conservative CFLAGS so yeah, it's not just for ricers. This sounds yummy but I'm wondering whether it will really be possible to just enable it globally. It might open up a whole can of bugs on various programs?

              Comment


              • #22
                gentoo is where it's really at

                Comment


                • #23
                  Originally posted by alec View Post
                  Premature optimization is the root of all evil.
                  You gentoo users don't test whether it actually gives you any gain...
                  Not true. I have a dual boot currently set up between Ubuntu 9.04 (from mid January) and an up-to-date -O3 -march=core2 compiled gentoo system built with a lot of attention to detail to USE flags.

                  The difference in performance I experienced in the same version of wine running Guild Wars was pretty substantial. I can't quote exact numbers since it's been a while since I've even booted into the Ubuntu system, but if anyone is dying of curiousity I'll do a check.

                  Comment


                  • #24
                    Yet another Gentoo user here. I'm wondering if the x264 encoder won't be able to take advantage of this as it doesn't fully utilize both cores even with two threads specified.

                    CFLAGS="-O2 -march=native -pipe"

                    Comment


                    • #25
                      Originally posted by wswartzendruber View Post
                      Yet another Gentoo user here. I'm wondering if the x264 encoder won't be able to take advantage of this as it doesn't fully utilize both cores even with two threads specified.
                      Just renice it. By default, it doesn't use much (85% maybe) but if you renice it aggressively you'll get something closer to 97-99% (on 4 cores). Also consider running two or more encodes in parallel.

                      Comment


                      • #26
                        Originally posted by wswartzendruber View Post
                        Yet another Gentoo user here. I'm wondering if the x264 encoder won't be able to take advantage of this as it doesn't fully utilize both cores even with two threads specified.
                        I recently converted a divx into mpg using ffmpeg (or mencoder, I don't exactly remember sorry). I tried using the -threads option of ffmpeg, just thinking about using all the cores of my C2Q.
                        Using "top", I saw that the 4 cores were fully used when "threads" option was set to "16".
                        However, with such a value, the resulting file was crappy : full of colorized squares on the border of the screen.

                        I think the multithreading option in ffmpeg needs more tuning. Meanwhile, you'll just have to use your Quad core as it was a single core.

                        Comment


                        • #27
                          From the FAQ...

                          3.4 Why do I see a slight quality degradation with multithreaded MPEG* encoding?

                          For multithreaded MPEG* encoding, the encoded slices must be independent, otherwise thread n would practically have to wait for n-1 to finish, so it's quite logical that there is a small reduction of quality. This is not a bug.
                          It does sound like what you saw may be a bug though.

                          Comment


                          • #28
                            Originally posted by Saist View Post
                            They switch to Debian
                            This

                            Definitely a nice feature to have in the compiler. I'd also hope there's a cpu detection routine that can detect your CPU and adjust the optimizations on the fly when a program compiled with this new GCC starts up. This will then mean a program will run optimally on whatever CPU it finds itself running on, without any recompiles.

                            Comment


                            • #29
                              I'd also hope there's a cpu detection routine that can detect your CPU and adjust the optimizations on the fly when a program compiled with this new GCC starts up. This will then mean a program will run optimally on whatever CPU it finds itself running on, without any recompiles.
                              It'd also mean that instead of ~5kb Hello World we'd have 20, 30, 40, or more..

                              Comment


                              • #30
                                The right answer is to have the specific CPU model encoded in the ELF program header and have the OS launch an object code recompiler the first time you try to execute a non-native binary. (Think about it, it's no harder than LLVM or java jit; easier because you can do whole program analysis instead of trying to discover dependencies J-I-T.)

                                Comment

                                Working...
                                X