Announcement

Collapse
No announcement yet.

A Look At The Open-Source Talos II POWER9 Performance Against x86_64 Server CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Actually performance of this RISC architecture it's not that bad, a reference Linux distribution with the same goals as Clear Linux for IntelAMD64, will be an oxygen ballon(or hellium, for rising fast vertically) for POWER9(and RiscV or AARCH64) and the rest of distributions just to copy bare metal optimisations, following close behind the leader.

    Comment


    • #32
      Hi Michael,

      It would be very interesting to see what difference hardware floating point would make. Your C-ray benchmark can be trivially updated by just changing the definition of "struct vec3" around line 56 in
      c-ray-mt.c to use __float128 instead of double.

      For Power9, I suggest compiling with option -mfloat128-hardware, as this may not be enabled by default.


      Regards,
      Peter B

      Comment


      • #33
        Opps, should have done some more testing prior to my previous post....
        Two more lines need changing. The lines around 587 & 600 need the type changing as well. Should now read;

        ~587
        Code:
        *((__float128*)&pos.x + i) = atof(ptr);
        ~600
        Code:
        *((__float128*)&col.x + i) = atof(ptr);
        c-ray-mt.c is now slowed down even more (with software float) but rendering now still works! About 30 times slower than double on x86.

        Regards,
        Peter B

        Comment


        • #34
          Originally posted by V1tol View Post

          Yes they are. First one is single-threaded server side Javascript. Second one is kind of raytracing bench that has single-threaded C and WebGL versions.
          The real problem here (and with all Michael's benchmarks) is that he throws out this random hodgepodge of stuff with no organization.
          A system like this can be judged by at least three sorts of criteria, each important for different users:
          - single-threaded performance
          - INDEPENDENT multi-threaded performance (which tells us something about the LLC, the memory bandwidth, and the NoC)
          - STRONGLY DEPENDENT multi-threaded performance (something like databases, or certain types of HPC code with a lot of communication; which tells us something about the performance of locking, atomics, and CPU to CPU communication)

          What I see here is almost all benchmarks in the second category, one or two apparently in the first category, and nothing that I see in the third category. But it may well be (I don't know) that the third category (ie as a database engine) is primarily where IBM targets this design, and where it excels...

          Comment

          Working...
          X