Announcement

Collapse
No announcement yet.

Ampere Altra Announced - Offering Up To 80 Cores Per Socket

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    It is not always about the total performance in synthetic benchmarks. Sometimes many slower cores may be beneficial instead of fewer faster cores.

    http://www.dirtcellar.net

    Comment


    • #12
      Originally posted by ms178 View Post
      It seems they don't support SVE or SVE2 extensions.
      Probably not. These would seem to be targeted at scale out cloud servers, and not the HPC market. Which makes a lot of sense for this iteration given market realities.

      Comment


      • #13
        Originally posted by edwaleni View Post
        Now that Ampere has a "TM" notation all over their documents, maybe NVidia will stop using it.
        Tesla, Fermi, Kepler, Maxwell, Pascal, Ampere.... see a pattern forming ?

        I guess all those electronics textbook publishers better start paying royalties.

        Comment


        • #14
          With 80 cores (even if individual cores aren't that great), this could be a hell of a video editing chip, acting rather like Bulldozer does for that job. If you recall, Bulldozer sucked for most workloads, but was something like 2 1/2 times faster than Phenom II x6 for a straight libx264 encode job benchmark. Real world 1080p video editing I can render to a 1080p finished file in kdenlive in realtime to about 1.3x realtime, with 8 threads on four very wide cores.

          With 80 individual cores that again are a bit slow, and a video editor written to ensure nothing runs single-threaded and throttles the rest, you would think ten times as fast, but lose some to ARM being slower than x86. That used to be a 4-1 loss with a 1GHZ single-core ARM performing rather like a 233mhz Pentium 2, caused by in-order execution as I recall. On the other hand, I once tested turning off 2 bulldozer modules vs enabling "one thread per core" (disable SMT) and found that adding the second thread to a core only really added 30% more throughput on a multithreaded job, similar to intel hyperthreading. Lose some to slower per-thread architecture, gain some back from having all of a core for each thread. 25% as fast single-threaded if ARM is still 1/4 as fast per-clock/per-core as x86, but instead of being 1.3x as fast as a 40 core, 2x as fast as a 40-core setup. We get 2/1.3 for individual cores being about 1.5x as fast as paired cores, 1.5*0.25 gives 0.375x as fast per-clock per core as Bulldozer if Bulldozer was as fast as other x86 chips per-core, which it is not. Multiply by ten times the thread count and you get 3.75 times as fast, then more for not losing as much per-core/clock throughput relative to Bulldozer as to a "normal" x86 design.

          Thus, a cluster of 80 overclocked ARM cores that ran as fast as overclocked bulldozer (4.3 GHZ here) should end up being at least 4x as fast real-world for a perfectly scaling multithreaded job. This would require that no one job force all the others to wait while using more than 1/80th of the total resources and being single-threaded. If that worked, we would have realtime video rendering of 4K video to H264 (rejecting patent-troll favorite H265, which is twice as CPU intensive).

          Right now, this might be an expensive server core. Ten years from now, that same rack-mount server box with everything in it might sell at a computer show for a few hundred bucks if even that, as something even faster comes along. Assuming my bulldozer chip lives that long, this could make a replacement for it.

          Comment


          • #15
            Originally posted by waxhead View Post
            It is not always about the total performance in synthetic benchmarks. Sometimes many slower cores may be beneficial instead of fewer faster cores.
            Unless you're renting out cores and charge the same for all cores, I don't know of a time this would be true. You can always have one core do two jobs, but you can't always have two cores do one job faster. Given 2 cores at speed 1 or 1 core at speed 2, you're a fool to pick the 2 cores. The only exception may be latency sensitive jobs or when dealing with realtime activities--not likely something you'd see machine like this used for.

            Comment


            • #16
              Originally posted by Luke View Post
              Thus, a cluster of 80 overclocked ARM cores that ran as fast as overclocked bulldozer (4.3 GHZ here) should end up being at least 4x as fast real-world for a perfectly scaling multithreaded job. This would require that no one job force all the others to wait while using more than 1/80th of the total resources and being single-threaded. If that worked, we would have realtime video rendering of 4K video to H264 (rejecting patent-troll favorite H265, which is twice as CPU intensive).

              Right now, this might be an expensive server core. Ten years from now, that same rack-mount server box with everything in it might sell at a computer show for a few hundred bucks if even that, as something even faster comes along. Assuming my bulldozer chip lives that long, this could make a replacement for it.
              What are you rattling on about? This is a server chip designed for cloud workloads. This is not a desktop peecee. And Bulldozer, WTF? Kick that obsolete trash to the curb. No one doing any kind of real work is using that today, and no sane person has any desire to keep using it for another ten years, lmao.

              Comment


              • #17
                Originally posted by phoronix View Post
                Phoronix: Ampere Altra Announced - Offering Up To 80 Cores Per Sockethttp://www.phoronix.com/vr.php?view=28933
                This article contains a typo:

                On the 4th paragraph, the 2nd sentence begins:

                On a power efficiency basis with SPEC int rate they claim 1.14x the perf-per-Watt
                However, the included chart shows a 1.41x perf-per-watt improvement. Looks like you transposed the numbers in the article body.

                Comment


                • #18
                  Originally posted by torsionbar28 View Post
                  What are you rattling on about? This is a server chip designed for cloud workloads. This is not a desktop peecee. And Bulldozer, WTF? Kick that obsolete trash to the curb. No one doing any kind of real work is using that today, and no sane person has any desire to keep using it for another ten years, lmao.
                  A server chip may not be DESIGNED for other uses, that doesn't mean it CAN'T be used for them, especially when it is old and being sold off surplus.

                  I myself use bulldozer to this day: I am not employed so not about to trash a machine that does today exactly what it did in 2012 as well as it ever did, and I am not shooting video in 4K so I don't NEED more. I don't play closed-source, paid games, so video editing is my main high performance workload. Compiling MATE, compiz, GTK etc go plenty fast in Bulldozer and I'm not building kernels every day. Don't need more power. Speaking of power, the big Threadripper chips use even more than Bulldozer, and Bulldozer idles (e.g sitting on a webpage with no JS running) at theoretical 35w and actually about 50w at the proc. Ryzen would have to get down to less than 20W at idle for chip with same full power performance to even begin to pay the electrical and materials cost of manufacturing a brand new chip. Same as buying a new car that saves gas but burns over 10,000 pounds of coal or fracked gas to smelt the metals, roll the sheet metal, cast the engine and transmission parts, etc.

                  By comparison, running my existing proc until it quits years down the road, than buying an old server (e.g the one in discussion here that is new today) at an auction or computer show doesn't use any new fabrication resources whatsoever.

                  Also bulldozer doesn't have the untrusted AMD PSP or Intel IME that can compromise security on an encrypted machine handling sensitive raw clips that must be carefully edited to use only the parts that can be publicly released. I once had to burn a grand jury subpeona for raw video clips after the big Aug 2018 counterprotest against Nazis in DC. They withdrew it, knowing I would never cooperate and that they could not defeat my encryption. I'm not about to pay money to add an additional potential back door to my encrypted disks.

                  Comment


                  • #19
                    I am antifa and proud of it. If you are a cop, it's no wonder you want people using chips with backdoors in them, while you no doubt rely on Intel's "quality assurance" IME switch to make it harder for serious operators to get into your servers and encrypt all your warrants with ransomware (which was done near Boston a few years ago).

                    Comment


                    • #20
                      Originally posted by Britoid View Post
                      But will it run Crysis ?
                      Yes, but the framerate won´t be that good unfortunatley as llvmpipe can use so much cores, but still it´s not that optimized

                      I ran some benchmarks on a 64 core zen 2 CPU with 16 DDR4 DIMMs, so theorectically it can push up to 410 GB/sec, but even a 5 year old midrange GPU with much less bandwidth and compute power blows it out of the water...

                      I don´t know where the bottleneck is, as the CPU load was still arround ~10% or less when running the benchmark.. Probably some Bandwidth limitation per core or other issues

                      B.t.w. Win 10 sucks hard at handling 128 Threads, it can only handle 64 threads per "CPU" so an EPYC with active hyperthreading is split into two "CPU Domain".. One process in Win 10 can only be executed on one "CPU Domain" so you can´t really use all Threads from one Process lol..

                      This get´s massively stupid if you have a 72 core CPU with 4 way hyperthreading (Dual Socket Xeon Phi) as Windows will split up the CPU into 5 Domains, as this is the min domain count which yield cluster of < 64 Cores.. They seriously fucked this up in their Kernel....

                      Edit: Running Windows on machines with complex NUMA architecture or a lot of threads is stupid in the first place anyway and no one who things straight would do this...
                      Last edited by Spacefish; 03 March 2020, 05:58 PM.

                      Comment

                      Working...
                      X