Announcement

Collapse
No announcement yet.

Benchmarking A 10-Core Tyan/IBM POWER Server For ~$300 USD

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    I got 69k on my 10-core (set to SMT4, p7zip 16.02 from my Void Linux builds). I'm not running PTS, but I ran the test the same as PTS does.

    Comment


    • #32
      I've just tried again in SMT4 mode, but got again 64k. Perhaps the difference is due to the compilers/compiler flags that went into generating the tested binaries.

      Comment


      • #33
        Originally posted by illuhad View Post
        I've just tried again in SMT4 mode, but got again 64k. Perhaps the difference is due to the compilers/compiler flags that went into generating the tested binaries.
        Could also be my cooling being able to sustain maximum frequency consistently, since the watercooling is a lot more efficient (i've never had CPU go over 60°C). I don't have any special compiler flags, it does use -O2 and -maltivec, but I'd expect the defaults for p7zip to also include those. Compiler version could be making a difference, I use gcc 8.3.

        Comment


        • #34
          Originally posted by q66_ View Post

          Could also be my cooling being able to sustain maximum frequency consistently, since the watercooling is a lot more efficient (i've never had CPU go over 60°C). I don't have any special compiler flags, it does use -O2 and -maltivec, but I'd expect the defaults for p7zip to also include those. Compiler version could be making a difference, I use gcc 8.3.
          Thanks for the hints! I found out that if I use the regular Ubuntu 7zip (also version 16.02) and just run
          Code:
          7za b
          just like PTS does according to the test profile, I also get results of around 70k. Is this what you did as well? I tried reinstalling the PTS test and explicitly setting CFLAGS and CXXFLAGS, but this also had no impact... It seems the PTS test behaves differently in some way which I don't yet quite understand. I'm on Ubuntu 18.10, so compiler is gcc 8.2 and so far I've also never seen my CPU go below the turbo speed of 3.49 GHz under load.
          Unlike just running 7za b, PTS does run the test several times though, so thermal throttling would be an appealing explanation. I'll double-check that the CPU stays at 3.5 GHz.

          EDIT: CPU stays at 3.5 GHz during the benchmark and setting performance governor to performance also has no impact
          Last edited by illuhad; 22 March 2019, 08:39 PM.

          Comment


          • #35
            Originally posted by illuhad View Post
            Thanks for the hints! I found out that if I use the regular Ubuntu 7zip (also version 16.02) and just run
            Code:
            7za b
            just like PTS does according to the test profile, I also get results of around 70k. Is this what you did as well? I tried reinstalling the PTS test and explicitly setting CFLAGS and CXXFLAGS, but this also had no impact... It seems the PTS test behaves differently in some way which I don't yet quite understand. I'm on Ubuntu 18.10, so compiler is gcc 8.2 and so far I've also never seen my CPU go below the turbo speed of 3.49 GHz under load.
            Unlike just running 7za b, PTS does run the test several times though, so thermal throttling would be an appealing explanation. I'll double-check that the CPU stays at 3.5 GHz.
            yeah, that's what I did. It could as well be that default 7zip cflags don't include any optimizations (I don't know, I haven't checked), as the pts test file seems to compile its own, and distros always compile with optimizations, but who knows.

            Comment


            • #36
              Originally posted by q66_ View Post

              yeah, that's what I did. It could as well be that default 7zip cflags don't include any optimizations (I don't know, I haven't checked), as the pts test file seems to compile its own, and distros always compile with optimizations, but who knows.
              I think that's what is happening. PTS just enters the source direcory and executes make -j $NUM_PROCS, which, if I understand 7zip's build system correctly, just compiles with -O instead of higher optimization levels..
              I have now compiled it myself, with -O3 -mcpu=power8 -mtune=power8 you get a score of 72k

              Comment


              • #37
                Originally posted by illuhad View Post
                I have now compiled it myself, with -O3 -mcpu=power8 -mtune=power8 you get a score of 72k
                By the way, -mcpu=powerpc64le is identical to power8 (power8 is the first to support LE reliably) and you don't need to further specify -mtune once you have specified -mcpu (-mtune is a subset of -mcpu). -O3 is what makes the real difference, as on a properly configured compiler, -mcpu should already be set correctly for power8 baseline. For Void, we specify -mcpu=powerpc64le (i.e. power8) but -mtune=power9 as well as -maltivec.

                Comment


                • #38
                  Originally posted by q66_ View Post

                  By the way, -mcpu=powerpc64le is identical to power8 (power8 is the first to support LE reliably) and you don't need to further specify -mtune once you have specified -mcpu (-mtune is a subset of -mcpu). -O3 is what makes the real difference, as on a properly configured compiler, -mcpu should already be set correctly for power8 baseline. For Void, we specify -mcpu=powerpc64le (i.e. power8) but -mtune=power9 as well as -maltivec.
                  You're right of course, it's a bad habit of me to specify both mcpu and mtune. I also work with several lesser known compilers at work and when switching compilers I've grown a habit of just setting any architecture options I can find
                  Would -maltivec not be automatically activated when compiling for Power8/9 with higher optimization levels?

                  Comment


                  • #39
                    I could be wrong, but I don't think it's enabled by default.

                    Comment


                    • #40
                      PTS clearly needs support for reporting the SMT mode on POWER, where it reports the scaling governor and Spectre mitigations. I sent a pull req yesterday fixing the cpu temp sensor reporting, it's now merged just like the dcmi power was before the article. Michael really responds fast. Maybe I'll add the SMT reporting later, anyone else is also welcome to do so.
                      Originally posted by kgardas
                      thanks for the article, but it contains several issues/errors but those are quite fine
                      Please list them.
                      Originally posted by curaga
                      The updated PNOR didn't work, failed to boot, so I restored the original 1.0.
                      Yesterday I flashed 1.01 again, then pulled power. Now there are memory configuration errors for all four dimms. The Centaurs are really undocumented, shame on IBM. Anyway, by hunch I checked the components for what changed between the two PNOR versions, and saw that in habanero_xml they enabled the RAM interleave mode. Then half an hour searching trying to find any docs at all on Centaur, finally found an OpenPower porting guide explaining what that bit does: improves RAM performance but requires at least four dimms in a specific config, in specific centaurs. This is the opposite of my setup, and opposite to the way Tyan's manual tells you to put dimms in. Bah. Anyway, ran out of time at that point, will try the OpenPower dimm arrangement later, if it boots fine.

                      Comment

                      Working...
                      X