Announcement

Collapse
No announcement yet.

CPUs From 2004 Against AMD's New 64-Core Threadripper 3990X + Tests Against FX-9590

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Raka555 View Post
    vs Raspberry PI ?

    It is actually not that impressive that it is only 4x faster with compiling the kernel and only 2x faster encoding an mp3 than the AMD FX-9590 with its 64/128 against 8/8...

    For me lots of cores only looks good in benchmarks designed for that purpose.
    In the "real world" you still get bad diminishing returns ...

    Yesterday I ran a program that calculates prime numbers and I was not impressed that my 2012 model i7-3770 (3.4GHz/3.9GHZ) did it in 4s versus my "shinny new" ryzen7-3700x (3.6Ghz/4.4GHz) which only managed 8s....

    Running bloatware is where the ryzens shine with all their cache, but pure calculations intel seems to still be far ahead...
    An actual pi3 b+ on 20.04 arm64 gives this result
    real 0m18.248s
    user 0m18.193s
    sys 0m0.005s
    That way neither 3770 nor 3700x looks impressive.
    This isn't a meaningful benchmark at all.

    Comment


    • #32
      Originally posted by muncrief View Post
      I held on to my old FX 6300/990FX system all the way up until Zen2, when I purchased my current R7-3700X/X570 system.

      Needless to say the performance difference is astounding

      I'm so glad I stuck with AMD even though I had to live with lesser performance for quite awhile. But I knew that if AMD went under we'd be at the mercy of the predatory Intel corporation forever.

      As an embedded systems engineer who designed a few simple custom microprocessors and microcontrollers back in the day I realized what a monumental error Bulldozer was. And a year or so after its release I also sadly realized that the architecture couldn't be salvaged, and it would be awhile before a new one could be developed. I didn't know it would be quite this long, but still the wait was worth it.
      Actually, as an embedded systems engineer you are a disgrace and you don't even know what you are talking about.... I expect more from people who supposedly know about cpu design. Not that simple microcontrollers are a feat, in university they design them at first year these days, but still....

      Bulldozer was a great architecture and was a step towards Fusion. AMD's grand plan was to eliminate FPU and SIMD from the cpu cores completely, eventually, and move those calculations on the iGPU. This makes a metric ton of sense, since cpu cores only rarely calculate floating point math. And those calculations are better suited for gpgpu, which is only hindered these days by pcie latency. AMD Fusion was the best idea for cpus in 2 decades. But AMD didn't have the software and marketing grunt to push for such change, and Intel realising they would lose if AMD went that road, doubled up on AVX and their floating point calculations, especially per thread.

      These days on 7nm, cpu cores even with all those SIMD parts, are TINY. It would have made a lot more sense to have even tinier cpu cores by removing the floating point units (which cost a LOT of silicon), adding tons of cache, and a beefy igpu, and move those calculations there. It would have been far better performant. It would allow the cpu cores to stop bothering with things they are not at their best, and leave the igpu do what it is best suited for... But this failed to evolve because idiots thought Bulldozer was a failure just because video games relied still on single and dual cores and as we all know, gaming is the most important thing in computing.... Even today intel sells a ton of cpus because it has slightly better per core performance and this matters to gaming. People are cretins. Now all AMD is doing is copying Intel's design but selling it at a far lower profit margin.... Yay.

      Comment


      • #33
        Originally posted by mlau View Post

        Yes, this performs over twice as fast on intel hardware. Run it with perf on Zen:

        Code:
        # perf stat ./rand
        664580
        
        Performance counter stats for './rand':
        
        7.479,07 msec task-clock:u # 0,999 CPUs utilized
        0 context-switches:u # 0,000 K/sec
        0 cpu-migrations:u # 0,000 K/sec
        52 page-faults:u # 0,007 K/sec
        35.198.688.120 cycles:u # 4,706 GHz
        15.034.444 stalled-cycles-frontend:u # 0,04% frontend cycles idle
        [B]33.164.471.465 stalled-cycles-backend:u # 94,22% backend cycles idle[/B]
        17.446.305.050 instructions:u # 0,50 insn per cycle
        # 1,90 stalled cycles per insn
        3.508.141.911 branches:u # 469,061 M/sec
        4.877.261 branch-misses:u # 0,14% of all branches
        I guess integer division is not a strong point of Zen.
        How do you tell it is integer division that is so low ?

        Comment


        • #34
          Originally posted by TemplarGR View Post
          Bulldozer was a great architecture and was a step towards Fusion. AMD's grand plan was to eliminate FPU and SIMD from the cpu cores completely, eventually, and move those calculations on the iGPU.
          The fun fact is that this is what is happening just now. Intel is slowly embedding every sort of coprocessor in its CPUs ( starting with FPGAs ). They are working to build an API to supersede OpenCL, so software can take advantage of iGPUs instead of relaying on AVX.

          But I don't see a real point in stripping the vector processors from CPU cores. Yes you can, but you just put them into the iGPU. AMD APUs with HSA were just a bunch of CPU cores + iGPU, with the floating point part developed much more on the iGPU than the CPUs. This simplifies CPUs design of course. And iGPUs must implement vector processors nonetheless. So, at least, you don't waste transistors and energy on powering SIMD processors in the CPU.

          Originally posted by TemplarGR View Post
          Even today intel sells a ton of cpus because it has slightly better per core performance and this matters to gaming.
          Not even this. In the benchmark, Ryzens are very often on top of Intel CPUs on single threading performance.

          Comment


          • #35
            Originally posted by Raka555 View Post

            How do you tell it is integer division that is so low ?
            Run the program with "perf record", and look at the data with perf annotate.
            The test for the remainder being zero (test edx, edx) takes up 90% of the spent time,
            at least on my system. The code generated is identical for haswell and zen.
            Maybe it's also a code scheduling issue in gcc? amd is far behind intel in compiler optimizations.

            Comment


            • #36
              Good old FX doesn't look half bad in these comparisons. With only 4 full cores ( each with two halves) on a a chip so old againt cutting-edge newest generation over the top model. Excavatro wasn't that bad for tasks that could be spread amongst those threads and run on optimized code...

              Comment


              • #37
                Originally posted by Raka555 View Post

                If you mean what I get when timing it:

                r7-3700x:
                real 0m8.138s

                i7-3770:
                real 0m4.083s

                i7-4600u:
                real 0m4.883s
                Xeon E-2286M (fastest current available mobile cpu)
                fastest result
                Code:
                real 0m2,882s
                slowest result
                Code:
                real 0m3,012s

                Code:
                $ perf stat ./prime_gcc
                664580
                
                Performance counter stats for './prime_gcc':
                
                          2.914,30 msec task-clock:u              #    1,000 CPUs utilized          
                                 0      context-switches:u        #    0,000 K/sec                  
                                 0      cpu-migrations:u          #    0,000 K/sec                  
                                56      page-faults:u             #    0,019 K/sec                  
                    14.151.736.249      cycles:u                  #    4,856 GHz                    
                    17.446.316.395      instructions:u            #    1,23  insn per cycle         
                     3.508.140.058      branches:u                # 1203,766 M/sec                  
                         4.733.558      branch-misses:u           #    0,13% of all branches        
                
                       2,914653614 seconds time elapsed
                
                       2,911537000 seconds user
                       0,000999000 seconds sys
                (n =10 / gcc 9.2.1/Clear Linux/ gov set to performance/ AC plugedin)
                Last edited by CochainComplex; 10 February 2020, 05:06 AM.

                Comment


                • #38
                  I think it would be interesitng to see RISC-V croes done by AMD around old Excavator concepts- iE few lean cores clusteredd in one fat core, sharing resources.
                  Excavator had 2, given that this could be done now on advanced 5nm process, let's go with 4 per fat core and 16 fat cores per CCX. With perhaps much less L3 and more cores, optimized for CPU bottlenecks etc and done on SP3 socket with 8-channel DDR4...

                  Comment


                  • #39
                    You can find some old benchmarks at Passmark site. First CPU they have is some VIA thing, so no original Pentium or 386 or 486 unfortunately.

                    What I could find for the CPUs you were interested in, fastest CPU from each generation taken:
                    CPU Overall Single Thread
                    AMD-K6-III 550 MHz 108 NA
                    Intel Pentium III Mobile 750MHz 103 NA
                    Intel Pentium 4 3.73GHz 486 701
                    Intel Core2 Duo E8600 @ 3.33GHz 2398 1370
                    Intel Core i7-3970X @ 3.50GHz (Sandy Bridge) 12691 2017
                    Intel Core i7-6700K @ 4.00GHz (Sky Lake) 11108 2354
                    Intel Core i7-1065G7 @ 1.30GHz (Ice Lake) (mobile CPU) 10496 2523











                    This table should have some AMD in it. Athlons, 64 bit Athlons, Bulldozer, then big gap until Ryzen... Maybe someone else can chip in

                    Originally posted by birdie View Post
                    Gains in single threaded performance for the past 10 years have still been minuscule in comparison to what we had from 1981 to the end of the 00's where performance increased 100 fold or maybe more.

                    It would be great if someone managed to compare the following CPUs:
                    • Intel 386
                    • Intel 486
                    • Pentium Pro
                    • Pentium 2
                    • Pentium III
                    • Pentium 4
                    • Core 2 Duo
                    • Sandy Bridge
                    • Sky Lake
                    • Ice Lake
                    Intel 8086 and 286 are out of the question since they lack 32bit support.

                    Comment


                    • #40
                      Originally posted by TemplarGR View Post
                      Even today intel sells a ton of cpus because it has slightly better per core performance and this matters to gaming.
                      AMD completely outsells Intel in the DIY market for performance reasons. Intel outsells AMD in the OEM maket for business reasons that have nothing to do with CPU performance. Unfortunately the DIY market is tiny compared to the OEM one.

                      Originally posted by mlau View Post
                      Maybe it's also a code scheduling issue in gcc? amd is far behind intel in compiler optimizations.
                      Could also be different uarch optimizations. For example Zen have the same IDIV performance for int32 and int64, while Skylake is faster for int32 but considerably slower for int64.

                      Anyone with an Intel CPU willing to retest changing int to int64_t?

                      Code:
                      bool IsPrime(int64_t test)
                      {
                          for(int64_t i= 2; i * i <= test; i++){
                              if(test%i==0){
                                  return false;
                              }
                          }
                          return true;
                      }
                      
                      int main()
                      {
                              int count=0;
                              for (int64_t i=1;i<1000*1000*10;++i) {
                                      if (IsPrime(i)) {
                                              count++;
                                      }
                              }
                              printf("%d\n",count);
                      }

                      Comment

                      Working...
                      X