Announcement

Collapse
No announcement yet.

R800 3D mesa driver is shortly before release

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Qaridarium: You're correct in saying that there have been great strides made in parallelization. What you're not getting is that there are sub-problems that cannot be computed in parallel (at least not by deterministic processes) by their very nature, not because the hardware just isn't clever enough. For example, consider function composition: f(g(x)). There's no way to calculate f(g(x)) and g(x) in parallel for all f() and g() and x. There are certainly some combinations for which the processes could run in parallel, but there are others for which f() will immediately require the result of g() before it can proceed. It's possible to speculatively execute multiple code paths, but no set of logic gates will let you prefetch data from the future.

    Comment


    • #82
      Originally posted by Ex-Cyber View Post
      Qaridarium: You're correct in saying that there have been great strides made in parallelization. What you're not getting is that there are sub-problems that cannot be computed in parallel (at least not by deterministic processes) by their very nature, not because the hardware just isn't clever enough. For example, consider function composition: f(g(x)). There's no way to calculate f(g(x)) and g(x) in parallel for all f() and g() and x. There are certainly some combinations for which the processes could run in parallel, but there are others for which f() will immediately require the result of g() before it can proceed. It's possible to speculatively execute multiple code paths, but no set of logic gates will let you prefetch data from the future.
      you are wrong ;-) there is 1 way!

      your problem is only valid if you are using 'Imperative Programming'

      yes in the 'Imperative Programming' there is no way out.

      but there is an second valid world an negative-mirrored world.-..

      its the so called Lambda calculus

      http://de.wikipedia.org/wiki/Lambda-Kalk%C3%BCl

      you can do your calculate by define an lambda calculus and brute-force-multi-core them!
      Phantom circuit Sequence Reducer Dyslexia

      Comment


      • #83
        Originally posted by FunkyRider View Post
        Even Intel's superior QPI can't claim 'native speed', accessing neighbor die's L3/IMC via QPI is about the same speed as using a FSB in older generation.

        It is you who need to do some home work before you talk out of your dreams. Two die communicate via HT/QPI is about twice slow as accessing local L3/IMC, that doesn't include L3 snooping.

        AMD can reduce snooping traffic by denoting 1/2MB of it's 6MB L3 as directory cache, but that also comes with a price of reduced overall cache capacity.

        Overall, dreaming of a 2 die x 6 core package performs same as a native 12 core is not likely to happen soon
        check my word "fast" you talk about slow QPI's and slow HT's i talk about 'FAST'

        means fast like 100% nativ :-)

        ok ;-)

        "Overall, dreaming of a 2 die x 6 core package performs same as a native 12 core is not likely to happen soon"

        a 2 die x6 core package can be faster per ?/$ that's because a big die is much more expensive than 2 dies

        means for the same ?/$ the 2 dies can have MORE transistores means more cache and more pipelines

        so yes for the same ???/$$$ a "dreaming of a 2 die x 6 core package performs" faster is true and valid ...
        Phantom circuit Sequence Reducer Dyslexia

        Comment


        • #84
          It is thought that generic "unlimited" parallelization is impossible, in particular because it is thought that P != NC. No proof exists though, like for P != NP.

          Emulating a machine and evaluating a circuit are canonical problems thought to be impossible to parallelize.

          Comment


          • #85
            Originally posted by Agdr View Post
            It is thought that generic "unlimited" parallelization is impossible, in particular because it is thought that P != NC. No proof exists though, like for P != NP.

            Emulating a machine and evaluating a circuit are canonical problems thought to be impossible to parallelize.
            Yes its true you can't parallelism a imperative programming basic command unlimited.

            But why do you use an imperative programming basic command for multi-core programming ?

            simpel exampel you can't multicore f(x) = x + 2 because its imperativ

            but you can multi-core something like λ x. x + 2 and the second core works on λ y. y + 2

            so you have 3 cores in use 1 for the imperative style method and 2 for the Lambda versions.

            so a simple multi-core method is to calculate the same problem on different methods and the fastest result get the hit.

            thats works because an imperative style is not always the fastest.
            Phantom circuit Sequence Reducer Dyslexia

            Comment


            • #86
              Q, the problem with that is that you can end up using 8 cores maxed out just to get the result 1% faster than you would have with a single core. Terribly inefficient, in fact it may even run slower than a single core version because you have the overhead of coordinating everything.

              Generally, if something is slow enough that you need to worry about this, you're best off thinking for a minute about what the fastest way to do the calculation is, and then choosing that one (on a single core, or however many it takes). It's usually not difficult to pick the correct algorithm given the inputs to the problem.

              Comment


              • #87
                Originally posted by smitty3268 View Post
                Q, the problem with that is that you can end up using 8 cores maxed out just to get the result 1% faster than you would have with a single core. Terribly inefficient, in fact it may even run slower than a single core version because you have the overhead of coordinating everything.

                Generally, if something is slow enough that you need to worry about this, you're best off thinking for a minute about what the fastest way to do the calculation is, and then choosing that one (on a single core, or however many it takes). It's usually not difficult to pick the correct algorithm given the inputs to the problem.
                i know that ;-)

                the lambda argument is just very theoretical means its very hard to get a speed up ;-)

                and hey your 1% argument.
                1% of an absolute single-core only function is like an atomic bomb in your brain ;-)

                its like the speed of the light =100% and now someone goes 1% faster as the light... 101%... wow.. this burns our world down.. '
                Phantom circuit Sequence Reducer Dyslexia

                Comment


                • #88
                  Originally posted by Qaridarium View Post
                  and hey your 1% argument.
                  1% of an absolute single-core only function is like an atomic bomb in your brain ;-)

                  its like the speed of the light =100% and now someone goes 1% faster as the light... 101%... wow.. this burns our world down.. '
                  Of course, modern AMD and Intel architectures are limited by power and will actually overclock when you're only taxing a few cores rather than all of them. So it may be a choice between 1-core running at 3.33Ghz versus 8 cores running at 2.67Ghz, and then that 1% speed boost ends up being way slower.

                  Comment


                  • #89
                    Originally posted by smitty3268 View Post
                    Of course, modern AMD and Intel architectures are limited by power and will actually overclock when you're only taxing a few cores rather than all of them. So it may be a choice between 1-core running at 3.33Ghz versus 8 cores running at 2.67Ghz, and then that 1% speed boost ends up being way slower.
                    1090T is targeted at 125 watts. Disabling half the cores gets you from 3.2 to 3.6 GHz while maintaining 125 watts. Thermal loss is considerably less than some older 125 watt chips, like my old X2-4800 -- same power, runs 20 degrees C hotter when crunching numbers all-out with the same cooling. I've heard of people OC'ing the 1090T successfully (i.e. stable) with ALL-cores well over 4 GHz on AIR ALONE. Obviously under those circumstances you'll be drawing well over 125 watts, but eh? Seems to handle it.

                    Comment


                    • #90
                      Originally posted by smitty3268 View Post
                      Of course, modern AMD and Intel architectures are limited by power and will actually overclock when you're only taxing a few cores rather than all of them. So it may be a choice between 1-core running at 3.33Ghz versus 8 cores running at 2.67Ghz, and then that 1% speed boost ends up being way slower.
                      come one '''1%''' as an argument.. why not 0.1% or 0.0001% ???

                      LOL

                      you can't speed up all thinks just in hardware by clock it higher?.

                      sometimes you need to optimize your software to ;-)
                      Phantom circuit Sequence Reducer Dyslexia

                      Comment

                      Working...
                      X