Announcement

Collapse
No announcement yet.

CPUs From 2004 Against AMD's New 64-Core Threadripper 3990X + Tests Against FX-9590

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by Raka555 View Post

    If you mean what I get when timing it:

    r7-3700x:
    real 0m8.138s

    i7-3770:
    real 0m4.083s

    i7-4600u:
    real 0m4.883s
    Xeon E-2286M (fastest current available mobile cpu)
    fastest result
    Code:
    real 0m2,882s
    slowest result
    Code:
    real 0m3,012s

    Code:
    $ perf stat ./prime_gcc
    664580
    
    Performance counter stats for './prime_gcc':
    
              2.914,30 msec task-clock:u              #    1,000 CPUs utilized          
                     0      context-switches:u        #    0,000 K/sec                  
                     0      cpu-migrations:u          #    0,000 K/sec                  
                    56      page-faults:u             #    0,019 K/sec                  
        14.151.736.249      cycles:u                  #    4,856 GHz                    
        17.446.316.395      instructions:u            #    1,23  insn per cycle         
         3.508.140.058      branches:u                # 1203,766 M/sec                  
             4.733.558      branch-misses:u           #    0,13% of all branches        
    
           2,914653614 seconds time elapsed
    
           2,911537000 seconds user
           0,000999000 seconds sys
    (n =10 / gcc 9.2.1/Clear Linux/ gov set to performance/ AC plugedin)
    Last edited by CochainComplex; 02-10-2020, 05:06 AM.

    Comment


    • #42
      I think it would be interesitng to see RISC-V croes done by AMD around old Excavator concepts- iE few lean cores clusteredd in one fat core, sharing resources.
      Excavator had 2, given that this could be done now on advanced 5nm process, let's go with 4 per fat core and 16 fat cores per CCX. With perhaps much less L3 and more cores, optimized for CPU bottlenecks etc and done on SP3 socket with 8-channel DDR4...

      Comment


      • #43
        You can find some old benchmarks at Passmark site. First CPU they have is some VIA thing, so no original Pentium or 386 or 486 unfortunately.

        What I could find for the CPUs you were interested in, fastest CPU from each generation taken:
        CPU Overall Single Thread
        AMD-K6-III 550 MHz 108 NA
        Intel Pentium III Mobile 750MHz 103 NA
        Intel Pentium 4 3.73GHz 486 701
        Intel Core2 Duo E8600 @ 3.33GHz 2398 1370
        Intel Core i7-3970X @ 3.50GHz (Sandy Bridge) 12691 2017
        Intel Core i7-6700K @ 4.00GHz (Sky Lake) 11108 2354
        Intel Core i7-1065G7 @ 1.30GHz (Ice Lake) (mobile CPU) 10496 2523











        This table should have some AMD in it. Athlons, 64 bit Athlons, Bulldozer, then big gap until Ryzen... Maybe someone else can chip in

        Originally posted by birdie View Post
        Gains in single threaded performance for the past 10 years have still been minuscule in comparison to what we had from 1981 to the end of the 00's where performance increased 100 fold or maybe more.

        It would be great if someone managed to compare the following CPUs:
        • Intel 386
        • Intel 486
        • Pentium Pro
        • Pentium 2
        • Pentium III
        • Pentium 4
        • Core 2 Duo
        • Sandy Bridge
        • Sky Lake
        • Ice Lake
        Intel 8086 and 286 are out of the question since they lack 32bit support.

        Comment


        • #44
          Originally posted by TemplarGR View Post
          Even today intel sells a ton of cpus because it has slightly better per core performance and this matters to gaming.
          AMD completely outsells Intel in the DIY market for performance reasons. Intel outsells AMD in the OEM maket for business reasons that have nothing to do with CPU performance. Unfortunately the DIY market is tiny compared to the OEM one.

          Originally posted by mlau View Post
          Maybe it's also a code scheduling issue in gcc? amd is far behind intel in compiler optimizations.
          Could also be different uarch optimizations. For example Zen have the same IDIV performance for int32 and int64, while Skylake is faster for int32 but considerably slower for int64.

          Anyone with an Intel CPU willing to retest changing int to int64_t?

          Code:
          bool IsPrime(int64_t test)
          {
              for(int64_t i= 2; i * i <= test; i++){
                  if(test%i==0){
                      return false;
                  }
              }
              return true;
          }
          
          int main()
          {
                  int count=0;
                  for (int64_t i=1;i<1000*1000*10;++i) {
                          if (IsPrime(i)) {
                                  count++;
                          }
                  }
                  printf("%d\n",count);
          }

          Comment


          • #45
            Originally posted by boitano View Post
            Anyone with an Intel CPU willing to retest changing int to int64_t?
            Code:
            real    0m10,153s
            Code:
            $ perf stat ./prime_gcc_int64
            664580
            
             Performance counter stats for './prime_gcc_int64':
            
                     10.184,94 msec task-clock:u              #    1,000 CPUs utilized          
                             0      context-switches:u        #    0,000 K/sec                  
                             0      cpu-migrations:u          #    0,000 K/sec                  
                            54      page-faults:u             #    0,005 K/sec                  
                47.981.199.959      cycles:u                  #    4,711 GHz                    
                17.446.323.647      instructions:u            #    0,36  insn per cycle        
                 3.508.147.325      branches:u                #  344,445 M/sec                  
                     4.406.190      branch-misses:u           #    0,13% of all branches        
            
                  10,185381274 seconds time elapsed
            
                  10,175968000 seconds user
                   0,000999000 seconds sys
            Xeon E-2286M
            Last edited by CochainComplex; 02-10-2020, 10:05 AM.

            Comment


            • #46
              Originally posted by CochainComplex View Post

              Code:
              real 0m10,153s
              So from ~3 seconds in 32bit to ~10 seconds in 64bit on the same Xeon CPU. Very interesting. So Intel CPUs are much slower with 64bit integers in this particular test.

              On my Ryzen 9 3900x it's ~8 seconds in both cases.

              The modulo and test part (test%i==0) is compiled with
              Code:
                idivl    %ecx
                testl    %edx, %edx
              According to Agner Fog's instruction tables, the throughput for "IDIV r32" for Ryzen is 14-30 cycles (depending on the operand's value) with a latency of 14-30 as well. Intel's Skylake has a very low 6 cycles throughput and 26 cycles latency. So Intel's CPUs are generally faster than AMD's for 32bit integer division.

              On the other hand Zen seems to be faster for "IDIV r64" with 14-45 cycles both throughput and latency while Skylake's is 24-90 cycles with a latency of 42-95.

              So it appears AMD have privileged 64bit code in their uarch.

              Of course there are other factors as well, but uarch optimizations seems to play a role in this instance.
              Last edited by boitano; 02-10-2020, 10:55 AM.

              Comment


              • #47
                Originally posted by boitano View Post

                AMD completely outsells Intel in the DIY market for performance reasons. Intel outsells AMD in the OEM maket for business reasons that have nothing to do with CPU performance. Unfortunately the DIY market is tiny compared to the OEM one.



                Could also be different uarch optimizations. For example Zen have the same IDIV performance for int32 and int64, while Skylake is faster for int32 but considerably slower for int64.

                Anyone with an Intel CPU willing to retest changing int to int64_t?

                Code:
                bool IsPrime(int64_t test)
                {
                for(int64_t i= 2; i * i <= test; i++){
                if(test%i==0){
                return false;
                }
                }
                return true;
                }
                
                int main()
                {
                int count=0;
                for (int64_t i=1;i<1000*1000*10;++i) {
                if (IsPrime(i)) {
                count++;
                }
                }
                printf("%d\n",count);
                }
                With int64_t:

                r7-3700x
                real 0m8.138s vs 0m8.138s

                i7-3770:
                real 0m13.938s vs 0m4.083s

                i7-4600u:
                real 0m15.719s vs 0m4.883s

                This is a nice find. So 64bit integers are hurting Intel big time, while they are doing very well with 32bit integers. No wonder they were pushing x32 so hard.
                I wonder what happened to the x32 efforts...

                Something similar seems to be happening on the RPIs as well. "lilunxm12" reported the following on their pi3b+ on 64bit Ubuntu 20.04:
                real 0m18.248s
                user 0m18.193s
                sys 0m0.005s

                I am getting the following on my rpi3b+ with Raspbian 10 32bit:
                real 1m26.870s
                user 1m26.827s
                sys 0m0.022s

                That is a huge difference. But my RPI might be throttling as I don't have active cooling on it.
                Last edited by Raka555; 02-10-2020, 12:14 PM.

                Comment


                • #48
                  Originally posted by Raka555 View Post

                  This is a nice find. So 64bit integers are hurting Intel big time, while they are doing very well with 32bit integers.
                  I would say Intel suffers greatly with 64bit signed integer division (IDIV instruction). I don't think you can generalize saying Zen is always faster than Skylake in 64bit integer arithmetic (if I had an Intel CPU at hand I would investigate though).

                  This example also tells us why synthetic benchmarks are useless.

                  Comment


                  • #49
                    Originally posted by Raka555 View Post
                    For me lots of cores only looks good in benchmarks designed for that purpose.
                    In the "real world" you still get bad diminishing returns ...
                    As with everything, it depends entirely on what your "real world" consists of. If Solitaire and Twitter are your use case, then no, you won't see much improvement, this part is not for you. If you're running something CPU bound that scales to 128 threads, then yes, you will see a massive improvement, with returns that scale linearly with the core count, as the benchmarks illustrate. This is not a one-size-fits-all part. Nobody is recommending it to grandma so she can check her AOL email.
                    Last edited by torsionbar28; 02-10-2020, 12:54 PM.

                    Comment


                    • #50
                      Originally posted by torsionbar28 View Post
                      As with everything, it depends entirely on what your "real world" consists of. If Solitaire and Twitter are your use case, then no, you won't see much improvement, this part is not for you. If you're running something CPU bound that scales to 128 threads, then yes, you will see a massive improvement, with returns that scale linearly with the core count, as the benchmarks illustrate. This is not a one-size-fits-all part. Nobody is recommending it to grandma so she can check her AOL email.
                      Nobody is recommending a 128 threads CPU to anybody, except maybe Hollywood VFX artists and AAA game programmers.
                      Maybe I'm wrong but It seems to me AMD put a 64 cores workstation CPU to market more because they can than because there's a sizeable market for it. Feels more like a trollish move against Intel.

                      Comment

                      Working...
                      X