Announcement

Collapse
No announcement yet.

CPUs From 2004 Against AMD's New 64-Core Threadripper 3990X + Tests Against FX-9590

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by boitano View Post
    Anyone with an Intel CPU willing to retest changing int to int64_t?
    Code:
    real    0m10,153s
    Code:
    $ perf stat ./prime_gcc_int64
    664580
    
     Performance counter stats for './prime_gcc_int64':
    
             10.184,94 msec task-clock:u              #    1,000 CPUs utilized          
                     0      context-switches:u        #    0,000 K/sec                  
                     0      cpu-migrations:u          #    0,000 K/sec                  
                    54      page-faults:u             #    0,005 K/sec                  
        47.981.199.959      cycles:u                  #    4,711 GHz                    
        17.446.323.647      instructions:u            #    0,36  insn per cycle        
         3.508.147.325      branches:u                #  344,445 M/sec                  
             4.406.190      branch-misses:u           #    0,13% of all branches        
    
          10,185381274 seconds time elapsed
    
          10,175968000 seconds user
           0,000999000 seconds sys
    Xeon E-2286M
    Last edited by CochainComplex; 10 February 2020, 10:05 AM.

    Comment


    • #42
      Originally posted by CochainComplex View Post

      Code:
      real 0m10,153s
      So from ~3 seconds in 32bit to ~10 seconds in 64bit on the same Xeon CPU. Very interesting. So Intel CPUs are much slower with 64bit integers in this particular test.

      On my Ryzen 9 3900x it's ~8 seconds in both cases.

      The modulo and test part (test%i==0) is compiled with
      Code:
        idivl    %ecx
        testl    %edx, %edx
      According to Agner Fog's instruction tables, the throughput for "IDIV r32" for Ryzen is 14-30 cycles (depending on the operand's value) with a latency of 14-30 as well. Intel's Skylake has a very low 6 cycles throughput and 26 cycles latency. So Intel's CPUs are generally faster than AMD's for 32bit integer division.

      On the other hand Zen seems to be faster for "IDIV r64" with 14-45 cycles both throughput and latency while Skylake's is 24-90 cycles with a latency of 42-95.

      So it appears AMD have privileged 64bit code in their uarch.

      Of course there are other factors as well, but uarch optimizations seems to play a role in this instance.
      Last edited by boitano; 10 February 2020, 10:55 AM.

      Comment


      • #43
        Originally posted by boitano View Post

        AMD completely outsells Intel in the DIY market for performance reasons. Intel outsells AMD in the OEM maket for business reasons that have nothing to do with CPU performance. Unfortunately the DIY market is tiny compared to the OEM one.



        Could also be different uarch optimizations. For example Zen have the same IDIV performance for int32 and int64, while Skylake is faster for int32 but considerably slower for int64.

        Anyone with an Intel CPU willing to retest changing int to int64_t?

        Code:
        bool IsPrime(int64_t test)
        {
        for(int64_t i= 2; i * i <= test; i++){
        if(test%i==0){
        return false;
        }
        }
        return true;
        }
        
        int main()
        {
        int count=0;
        for (int64_t i=1;i<1000*1000*10;++i) {
        if (IsPrime(i)) {
        count++;
        }
        }
        printf("%d\n",count);
        }
        With int64_t:

        r7-3700x
        real 0m8.138s vs 0m8.138s

        i7-3770:
        real 0m13.938s vs 0m4.083s

        i7-4600u:
        real 0m15.719s vs 0m4.883s

        This is a nice find. So 64bit integers are hurting Intel big time, while they are doing very well with 32bit integers. No wonder they were pushing x32 so hard.
        I wonder what happened to the x32 efforts...

        Something similar seems to be happening on the RPIs as well. "lilunxm12" reported the following on their pi3b+ on 64bit Ubuntu 20.04:
        real 0m18.248s
        user 0m18.193s
        sys 0m0.005s

        I am getting the following on my rpi3b+ with Raspbian 10 32bit:
        real 1m26.870s
        user 1m26.827s
        sys 0m0.022s

        That is a huge difference. But my RPI might be throttling as I don't have active cooling on it.
        Last edited by Raka555; 10 February 2020, 12:14 PM.

        Comment


        • #44
          Originally posted by Raka555 View Post

          This is a nice find. So 64bit integers are hurting Intel big time, while they are doing very well with 32bit integers.
          I would say Intel suffers greatly with 64bit signed integer division (IDIV instruction). I don't think you can generalize saying Zen is always faster than Skylake in 64bit integer arithmetic (if I had an Intel CPU at hand I would investigate though).

          This example also tells us why synthetic benchmarks are useless.

          Comment


          • #45
            Originally posted by Raka555 View Post
            For me lots of cores only looks good in benchmarks designed for that purpose.
            In the "real world" you still get bad diminishing returns ...
            As with everything, it depends entirely on what your "real world" consists of. If Solitaire and Twitter are your use case, then no, you won't see much improvement, this part is not for you. If you're running something CPU bound that scales to 128 threads, then yes, you will see a massive improvement, with returns that scale linearly with the core count, as the benchmarks illustrate. This is not a one-size-fits-all part. Nobody is recommending it to grandma so she can check her AOL email.
            Last edited by torsionbar28; 10 February 2020, 12:54 PM.

            Comment


            • #46
              Originally posted by torsionbar28 View Post
              As with everything, it depends entirely on what your "real world" consists of. If Solitaire and Twitter are your use case, then no, you won't see much improvement, this part is not for you. If you're running something CPU bound that scales to 128 threads, then yes, you will see a massive improvement, with returns that scale linearly with the core count, as the benchmarks illustrate. This is not a one-size-fits-all part. Nobody is recommending it to grandma so she can check her AOL email.
              Nobody is recommending a 128 threads CPU to anybody, except maybe Hollywood VFX artists and AAA game programmers.
              Maybe I'm wrong but It seems to me AMD put a 64 cores workstation CPU to market more because they can than because there's a sizeable market for it. Feels more like a trollish move against Intel.

              Comment


              • #47
                Originally posted by boitano View Post
                AMD completely outsells Intel in the DIY market for performance reasons. Intel outsells AMD in the OEM maket for business reasons that have nothing to do with CPU performance. Unfortunately the DIY market is tiny compared to the OEM one.
                Quite true, and unfortunately history has shown us that the "reasons" in the OEM market are due to anti-competitive behavior from intel. The Zen product line has been so well received however, that OEM's may be forced to go against intel's wishes. It will be very interesting indeed if Apple goes AMD in the next iteration of their laptops or desktops...

                Comment


                • #48
                  Originally posted by Raka555 View Post

                  With int64_t:

                  r7-3700x
                  real 0m8.138s vs 0m8.138s

                  i7-3770:
                  real 0m13.938s vs 0m4.083s

                  i7-4600u:
                  real 0m15.719s vs 0m4.883s
                  Oh cool, Intel has an optimized division for <64bit operands. Probably very useful for binary-only benchmark programs conceived in the windows-xp era
                  But that aside, the test program shows that even for a plain "int" current compilers emit 32bit division; so maybe this is so common that it's worth
                  optimizing the hardware for.
                  Last edited by mlau; 10 February 2020, 03:17 PM.

                  Comment


                  • #49
                    Originally posted by boitano View Post
                    Maybe I'm wrong but It seems to me AMD put a 64 cores workstation CPU to market more because they can than because there's a sizeable market for it. Feels more like a trollish move against Intel.
                    I guess intel also launched the $10,000 Xeon Platinum 8280 just to troll AMD, not to actually sell product because there's a market for it?

                    It's true there isn't a ton of software yet that can take advantage of that many threads at once. But those who have that kind of workload, already own the required software, and have pockets deep enough to buy these flagship chips. Think MATLAB, Maya, 4K video editing, or finite element analysis (structural stress, etc). Outside of these specialized segments, threading has always been a chicken or egg scenario. The hardware doesn't exist because there's no software to take advantage of it. The software doesn't exist because why put in the effort when there's no hardware to run it on.

                    AMD is doing something bold here, and they're walking the walk when it comes to delivering massive performance in a single socket. This is the future. IPC has not increased so dramatically over the years, as the benchmarks clearly show. If it was up to intel, we'd all still be running 4-core chips based on sandy bridge and 14nm++++. Or maybe even 32 bit chips, because Itanium.
                    Last edited by torsionbar28; 10 February 2020, 04:56 PM.

                    Comment


                    • #50
                      Originally posted by Raka555 View Post

                      With int64_t:

                      r7-3700x
                      real 0m8.138s vs 0m8.138s

                      i7-3770:
                      real 0m13.938s vs 0m4.083s

                      i7-4600u:
                      real 0m15.719s vs 0m4.883s

                      This is a nice find. So 64bit integers are hurting Intel big time, while they are doing very well with 32bit integers. No wonder they were pushing x32 so hard.
                      I wonder what happened to the x32 efforts...

                      Something similar seems to be happening on the RPIs as well. "lilunxm12" reported the following on their pi3b+ on 64bit Ubuntu 20.04:
                      real 0m18.248s
                      user 0m18.193s
                      sys 0m0.005s

                      I am getting the following on my rpi3b+ with Raspbian 10 32bit:
                      real 1m26.870s
                      user 1m26.827s
                      sys 0m0.022s

                      That is a huge difference. But my RPI might be throttling as I don't have active cooling on it.
                      On 20.04 arm64, using int64_t gives
                      real 0m33.329s
                      user 0m33.280s
                      sys 0m0.009s
                      You just shouldn't do that on a 32 bit system, as single registers can't hold the operand

                      Comment

                      Working...
                      X