Announcement

Collapse
No announcement yet.

R800 3D mesa driver is shortly before release

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    You are naively wrong. Opteron doesn't work that way. A single die will only have 6 cores. The rest are connected in HT mesh just like multi socket systems. Your non-linear scale is up to 6 cores for now. Please no more rocket speed jokes

    Comment


    • #77
      Amdahl's law is valid independently of the machine architecture. The architecture can impose serialization in some cases, but what really matters is the algorithm. You can't turn an inherently serial process (e.g. a naive bytecode interpreter fetch/decode/execute loop) into a parallel one merely by changing the host CPU.

      Comment


      • #78
        Originally posted by FunkyRider View Post
        You are naively wrong. Opteron doesn't work that way. A single die will only have 6 cores. The rest are connected in HT mesh just like multi socket systems. Your non-linear scale is up to 6 cores for now. Please no more rocket speed jokes
        "You are naively wrong. Opteron doesn't work that way."

        LOL you are just outdated

        and yes my old Opteron 2218 do not have L3 cache to share.

        and yes not all systems do have super-linear-speedup

        "A single die will only have 6 cores"

        how cares ?

        "The rest are connected in HT mesh just like multi socket systems."

        oh wow... nothing new here.. if the HT links are fast you can get nearly the same speed as nativ cores ;-)

        "Your non-linear scale is up to 6 cores for now. "

        no!.. the nonlinear speed up grow up more and more if you have more and more cpus on the same system.

        for linux limit is 4096 cores!

        only feature is needet is the cpu's share the RAM channels and L3 cache over the HT-Links.

        IBM power7 does the same and power7 do have much more than 100% in scale!

        "Please no more rocket speed jokes"

        its not a joke there are some CPUs with the ability of Super linear speed-up

        in my knowledge its the IBM power7 and the Opteron6000 and the newest intelcorei7 based xeon.

        Comment


        • #79
          Originally posted by Ex-Cyber View Post
          Amdahl's law is valid independently of the machine architecture. The architecture can impose serialization in some cases, but what really matters is the algorithm. You can't turn an inherently serial process (e.g. a naive bytecode interpreter fetch/decode/execute loop) into a parallel one merely by changing the host CPU.
          "Amdahl's law is valid independently of the machine architecture."

          no its not because in real you can build super linear speed-up system.

          " You can't turn an inherently serial process (e.g. a naive bytecode interpreter fetch/decode/execute loop) into a parallel one merely by changing the host CPU."

          in my point of view you can. yes you can do it

          one example is: the second core can do speculative calculations and if you have an hit you save time on the first core.

          http://en.wikipedia.org/wiki/Speculative_multithreading


          the second example is gcc4.4/4.5 can do automatic paralleling of singletheatet code.

          http://en.wikipedia.org/wiki/Automatic_parallelization

          Comment


          • #80
            Even Intel's superior QPI can't claim 'native speed', accessing neighbor die's L3/IMC via QPI is about the same speed as using a FSB in older generation.

            It is you who need to do some home work before you talk out of your dreams. Two die communicate via HT/QPI is about twice slow as accessing local L3/IMC, that doesn't include L3 snooping.

            AMD can reduce snooping traffic by denoting 1/2MB of it's 6MB L3 as directory cache, but that also comes with a price of reduced overall cache capacity.

            Overall, dreaming of a 2 die x 6 core package performs same as a native 12 core is not likely to happen soon

            Comment


            • #81
              Qaridarium: You're correct in saying that there have been great strides made in parallelization. What you're not getting is that there are sub-problems that cannot be computed in parallel (at least not by deterministic processes) by their very nature, not because the hardware just isn't clever enough. For example, consider function composition: f(g(x)). There's no way to calculate f(g(x)) and g(x) in parallel for all f() and g() and x. There are certainly some combinations for which the processes could run in parallel, but there are others for which f() will immediately require the result of g() before it can proceed. It's possible to speculatively execute multiple code paths, but no set of logic gates will let you prefetch data from the future.

              Comment


              • #82
                Originally posted by Ex-Cyber View Post
                Qaridarium: You're correct in saying that there have been great strides made in parallelization. What you're not getting is that there are sub-problems that cannot be computed in parallel (at least not by deterministic processes) by their very nature, not because the hardware just isn't clever enough. For example, consider function composition: f(g(x)). There's no way to calculate f(g(x)) and g(x) in parallel for all f() and g() and x. There are certainly some combinations for which the processes could run in parallel, but there are others for which f() will immediately require the result of g() before it can proceed. It's possible to speculatively execute multiple code paths, but no set of logic gates will let you prefetch data from the future.
                you are wrong ;-) there is 1 way!

                your problem is only valid if you are using 'Imperative Programming'

                yes in the 'Imperative Programming' there is no way out.

                but there is an second valid world an negative-mirrored world.-..

                its the so called Lambda calculus

                http://de.wikipedia.org/wiki/Lambda-Kalk%C3%BCl

                you can do your calculate by define an lambda calculus and brute-force-multi-core them!

                Comment


                • #83
                  Originally posted by FunkyRider View Post
                  Even Intel's superior QPI can't claim 'native speed', accessing neighbor die's L3/IMC via QPI is about the same speed as using a FSB in older generation.

                  It is you who need to do some home work before you talk out of your dreams. Two die communicate via HT/QPI is about twice slow as accessing local L3/IMC, that doesn't include L3 snooping.

                  AMD can reduce snooping traffic by denoting 1/2MB of it's 6MB L3 as directory cache, but that also comes with a price of reduced overall cache capacity.

                  Overall, dreaming of a 2 die x 6 core package performs same as a native 12 core is not likely to happen soon
                  check my word "fast" you talk about slow QPI's and slow HT's i talk about 'FAST'

                  means fast like 100% nativ :-)

                  ok ;-)

                  "Overall, dreaming of a 2 die x 6 core package performs same as a native 12 core is not likely to happen soon"

                  a 2 die x6 core package can be faster per €/$ that's because a big die is much more expensive than 2 dies

                  means for the same €/$ the 2 dies can have MORE transistores means more cache and more pipelines

                  so yes for the same €€€/$$$ a "dreaming of a 2 die x 6 core package performs" faster is true and valid ...

                  Comment


                  • #84
                    It is thought that generic "unlimited" parallelization is impossible, in particular because it is thought that P != NC. No proof exists though, like for P != NP.

                    Emulating a machine and evaluating a circuit are canonical problems thought to be impossible to parallelize.

                    Comment


                    • #85
                      Originally posted by Agdr View Post
                      It is thought that generic "unlimited" parallelization is impossible, in particular because it is thought that P != NC. No proof exists though, like for P != NP.

                      Emulating a machine and evaluating a circuit are canonical problems thought to be impossible to parallelize.
                      Yes its true you can't parallelism a imperative programming basic command unlimited.

                      But why do you use an imperative programming basic command for multi-core programming ?

                      simpel exampel you can't multicore f(x) = x + 2 because its imperativ

                      but you can multi-core something like λ x. x + 2 and the second core works on λ y. y + 2

                      so you have 3 cores in use 1 for the imperative style method and 2 for the Lambda versions.

                      so a simple multi-core method is to calculate the same problem on different methods and the fastest result get the hit.

                      thats works because an imperative style is not always the fastest.

                      Comment


                      • #86
                        Q, the problem with that is that you can end up using 8 cores maxed out just to get the result 1% faster than you would have with a single core. Terribly inefficient, in fact it may even run slower than a single core version because you have the overhead of coordinating everything.

                        Generally, if something is slow enough that you need to worry about this, you're best off thinking for a minute about what the fastest way to do the calculation is, and then choosing that one (on a single core, or however many it takes). It's usually not difficult to pick the correct algorithm given the inputs to the problem.

                        Comment


                        • #87
                          Originally posted by smitty3268 View Post
                          Q, the problem with that is that you can end up using 8 cores maxed out just to get the result 1% faster than you would have with a single core. Terribly inefficient, in fact it may even run slower than a single core version because you have the overhead of coordinating everything.

                          Generally, if something is slow enough that you need to worry about this, you're best off thinking for a minute about what the fastest way to do the calculation is, and then choosing that one (on a single core, or however many it takes). It's usually not difficult to pick the correct algorithm given the inputs to the problem.
                          i know that ;-)

                          the lambda argument is just very theoretical means its very hard to get a speed up ;-)

                          and hey your 1% argument.
                          1% of an absolute single-core only function is like an atomic bomb in your brain ;-)

                          its like the speed of the light =100% and now someone goes 1% faster as the light... 101%... wow.. this burns our world down.. '

                          Comment


                          • #88
                            Originally posted by Qaridarium View Post
                            and hey your 1% argument.
                            1% of an absolute single-core only function is like an atomic bomb in your brain ;-)

                            its like the speed of the light =100% and now someone goes 1% faster as the light... 101%... wow.. this burns our world down.. '
                            Of course, modern AMD and Intel architectures are limited by power and will actually overclock when you're only taxing a few cores rather than all of them. So it may be a choice between 1-core running at 3.33Ghz versus 8 cores running at 2.67Ghz, and then that 1% speed boost ends up being way slower.

                            Comment


                            • #89
                              Originally posted by smitty3268 View Post
                              Of course, modern AMD and Intel architectures are limited by power and will actually overclock when you're only taxing a few cores rather than all of them. So it may be a choice between 1-core running at 3.33Ghz versus 8 cores running at 2.67Ghz, and then that 1% speed boost ends up being way slower.
                              1090T is targeted at 125 watts. Disabling half the cores gets you from 3.2 to 3.6 GHz while maintaining 125 watts. Thermal loss is considerably less than some older 125 watt chips, like my old X2-4800 -- same power, runs 20 degrees C hotter when crunching numbers all-out with the same cooling. I've heard of people OC'ing the 1090T successfully (i.e. stable) with ALL-cores well over 4 GHz on AIR ALONE. Obviously under those circumstances you'll be drawing well over 125 watts, but eh? Seems to handle it.

                              Comment


                              • #90
                                Originally posted by smitty3268 View Post
                                Of course, modern AMD and Intel architectures are limited by power and will actually overclock when you're only taxing a few cores rather than all of them. So it may be a choice between 1-core running at 3.33Ghz versus 8 cores running at 2.67Ghz, and then that 1% speed boost ends up being way slower.
                                come one '''1%''' as an argument.. why not 0.1% or 0.0001% ???

                                LOL

                                you can't speed up all thinks just in hardware by clock it higher….

                                sometimes you need to optimize your software to ;-)

                                Comment

                                Working...
                                X