Announcement

Collapse
No announcement yet.

Benchmarking A 10-Core Tyan/IBM POWER Server For ~$300 USD

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    @q66_:

    Comment


    • #62
      https://github.com/open-power/habane...dade2c7731f2bd This commit, included in 1.01 but not in 1.0, mentions 32x32gb dimms, but doesn't say what other changes are/were needed. So at least I can gather 32x32 did not work in v1.00 PNOR.

      Comment


      • #63
        Oh, for anyone doing PTS benchmarks, if you use PTS from git you can do "MONITOR=cpu.temp,sys.power phoronix-test-suite ..." to have those sensors graphed.

        Comment


        • #64
          Originally posted by curaga View Post
          @q66_:
          The latest version of the manual says something else

          Comment


          • #65
            Next time I would prefer a more comparable benchmark with a common configuration, for example:

            For Intel, AMD, POWER8 and POWER9 the same configuration:
            Only 4 cores enabled
            SMT2
            Clock frequency: 3.8GHz fixed

            On all machines the same amount of RAM, same NVMe and same graphics card.

            Then the performance of the CPU and the power consumption would be comparable.

            Comment


            • #66
              Originally posted by curaga View Post
              https://github.com/open-power/habane...dade2c7731f2bd This commit, included in 1.01 but not in 1.0, mentions 32x32gb dimms, but doesn't say what other changes are/were needed. So at least I can gather 32x32 did not work in v1.00 PNOR.
              Speaking about devil. Has anyone succeed in compiling Habanero images from op-build project? I've compiled those, but no luck in using them so far. habanero_update.pnor seems to be even corrupted. Generally speaking my experience is very similar to this: https://github.com/open-power/op-build/issues/848 I neither can read whole flash content. I also get error 15 while reading at 99%. etc. The OP claims his system may be damaged by transport but I'm not so sure if this is not common behaviour on Habanero. In addition to his issue, I even cannot grab info from the generated habanero_update FW file. pflash claims the file is corrupted, but it's able to grab info from the whole habanero FW file.
              So I'm curious what's the general experience about it. Thanks!

              Comment


              • #67
                Originally posted by crystall View Post
                Just one note about the Mellanox card: while it's true that it doesn't need to load a binary blob to run it's because Mellanox cards have their own on-board firmware (which can be flashed and is strictly closed-source).
                I'd also like to note that it's getting quite hot and this is also probably due to its power consumption. If the box is idle I'm hitting ~40W consumption on CPU, ~40W consumption on PCIe bridge and guess what ~40W consumption on Mellanox. Idle box consume around ~170W so throwing away Mellanox may also be a nice receipt to save some power. Well that's at least my interpretation of what I see in BMC web:
                Code:
                 [TABLE="class: dashboard_tbl, border: 0, cellpadding: 6, cellspacing: 0"]
                 	 		[TR]
                 			[TD="class: lastcell"]Mem Proc0 Pwr[/TD]
                 			[TD="class: lastcell"]8 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Mem Proc1 Pwr[/TD]
                 			[TD="class: lastcell"]4 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Mem Proc2 Pwr[/TD]
                 			[TD="class: lastcell"]6 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Mem Proc3 Pwr[/TD]
                 			[TD="class: lastcell"]5 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Proc0 Power[/TD]
                 			[TD="class: lastcell"]41 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]PCIE Proc0 Pwr[/TD]
                 			[TD="class: lastcell"]41 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]IO A Power[/TD]
                 			[TD="class: lastcell"]40 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]IO B Power[/TD]
                 			[TD="class: lastcell"]4 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Fan Power A[/TD]
                 			[TD="class: lastcell"]0 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Storage Power A[/TD]
                 			[TD="class: lastcell"]3 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Storage Power B[/TD]
                 			[TD="class: lastcell"]3 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]Mem Cache Power[/TD]
                 			[TD="class: lastcell"]13 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 		[TR]
                 			[TD="class: lastcell"]GPU Power[/TD]
                 			[TD="class: lastcell"]2 Watts[/TD]
                 			[TD="class: lastcell"][IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/power-sc\/res\/view.gif"}[/IMG2][/TD]
                 		[/TR]
                 	 [/TABLE]
                Another interpretation may be that PCIE Proc0 Pwr may be PCIe interface of main CPU, but well, nothing's connected to it anyway -- not in my system.

                Comment


                • #68
                  Hi, guys.

                  I was considering buying Raport hardware, but I do not approve some of their decisions (like selection of SAS controller, not mentioning CPU core frequencies, even approximately, horrendous international shipping prices, unknown vendor and frequencies of their bundled DIMM modules at the time of shipping, etc, and butchering Blackbird a bit too much - only two memory channels, with two others wasted! and only two PCIe slots, with a lot of PCIe lanes going to waste, so I can't even plug GPU, PCIe SSD and 10Gbps Ethernet; plus nonsensical inclusion of audio, instead you know, 6 USB3 and USB2 ports at the back!?).

                  I lost patience, and did build ThreadRipper 2950X + 128GB + 2x U.2 drives for my new system, and I am supper happy.

                  However, I still want to use Power, and this looks like awesome cheap server to fill few remaining spots in my lab rack.


                  Quick question. About memory for this TN71-BP012. Because I am always worried about memory, and a lot of motherboards and system is picky about modules.

                  In service manual I found that it operates better with 8 DIMMs. With 16 DIMMs, it will operate at lower 1066MHz, instead of 1333MHz on each centaur channel, but also connection to CPU from Centaurs will be slower, so the total available peak bandwidth will be also lower. Similarly using 32GB DIMMs (which are basically two DIMMs on one stick and operate as quad rank), will only operate at 1066MHz. And possibly even lower if you have to of each on each channel.

                  Just reminder to all. Power8 CPU communicates with each of four Centaurs using separate channel (CPU can actually communicate with up to 8 Centaurs!, depending on model), using DMI with speed of 9.6 GT/s (~28.8GB/s), not 9.6Gbps as manual claims. And then Centaurs beyond caching, queuening, they also drive four DDR3 channels. However the 4 channels are not fully independent and do share some functionality in pairs, and there is minor penality when you populate both channels using this shared functionality. I am not sure why the frequency drops to 1066MHz when used tho, as it is not the same as using two or three DIMMs per channel on normal server or desktop mobos. (Possibly EMI). Each DIMM has dedicated connection to Centaur, not shared with other DIMMs in any way.


                  https://en.wikichip.org/wiki/ibm/centaur and ftp://ftp.tyan.com/doc/TN71-BP012_UG...or_Channel.pdf (page 37-38 and 74-74) are a good read. Plus some presentation slides from around the net.

                  So I want to go with 8x 8GB, so each of Centaur channel has two DIMMs without sharing.

                  So, I know it should be DDR3L-1333 (2666MT/s), aka PC3L-10600R. Centaurs can't drive 1.5V DIMMs.

                  However, manual ( ftp://ftp.tyan.com/doc/TN71-BP012_UG...or_Channel.pdf ) says:


                  Five 1.35V DDR3 RDIMMs are supported. [...]
                  • 4GB SR x8 with 4GB DRAM
                  • 8GB SR x4 with 4GB DRAM
                  • 8GB DR x8 with 4GB DRAM
                  • 16GB DR x4 with 4GB DRAM
                  • 32GB QR x4 with 4GB DRAM
                  Plus it says:


                  The most preferred choice of 4Gbit DRAM die rev. from the following vendors are Micron v90B 25nm, Hynix Polaris 25nm & Samsung RevD 25nm.
                  So, I was looking at this module: M393B1K70DH0-YH9 ( https://www.samsung.com/semiconducto...3B1K70DH0-YH9/ , https://www.samsung.com/semiconducto...mm_rev12-2.pdf ) , which is available both from piospartslap cheaply (but a low quantity), and my local ebay-like site at good prices and 50+ quantities.

                  However according to sites and Samsung, this module is 2Rx4 , which is not on the list. And I guess it is a bit older than manual and is probably based on 2Gbit chips?

                  Similarly, sk Hynix , HMT31GR7CFR4A-H9 ( https://www.skhynix.com/eolproducts....k=20&rc=module , https://www.skhynix.com/product/file...ad.do?seq=3147 ) appears to have similar issue, but says "based on 2Gb C-die". -H9 parts are ones clocked at 1333 9-9-9.

                  These modules are from 2010-2011.

                  And the TN71 is from about 2015/2016?

                  So I suspect these are not Hynix Polaris or Samsung 25nm, but older node, 32nm probably. And manual is probably referring to M393B1G70EB0-YK0 (1x4, 4Gbit E-die, 2014/2015) or M393B1G70EB0-CK0 (1x4, 4Gbit almost for sure , but no date or die info, no datasheet). I couldn't find Rev D from Samsung.

                  Would it better to find some more modern chips with 4Gbit dies, or sticking to modules with 2Gbit dies is going to be okish and offer full performance (or ? I do not think I care about the power consumption of the memory itself AFAIK newer modules, will still run at 1333MHz, and timing will be still the same at CL9.

                  Comment


                  • #69
                    If anyone is having problem with upgrading to 1.01 i found a way that worked for me. Update using megarac, then reboot the machine after logging in again, then pull the power cables from the server and pull out both power supplys. After that place one psu on the top slot which is the psu 2 and start up the machine. Once it boots up check the logs and see if fw is booting, if it is slide in the second psu and everything will work fine after that.

                    Also has anyone manage to get networking working with ubuntu 18.04 I notice that on my machine all ethernet connections are down. Is there a way to configure them by myself or should it be automatic ?

                    Comment


                    • #70
                      Originally posted by kgardas View Post

                      Speaking about devil. Has anyone succeed in compiling Habanero images from op-build project? I've compiled those, but no luck in using them so far. habanero_update.pnor seems to be even corrupted. Generally speaking my experience is very similar to this: https://github.com/open-power/op-build/issues/848 I neither can read whole flash content. I also get error 15 while reading at 99%. etc. The OP claims his system may be damaged by transport but I'm not so sure if this is not common behaviour on Habanero. In addition to his issue, I even cannot grab info from the generated habanero_update FW file. pflash claims the file is corrupted, but it's able to grab info from the whole habanero FW file.
                      So I'm curious what's the general experience about it. Thanks!
                      I just managed to compile the op-build and successfully flashed it on my tyan. Everything seems to be working fine on it as well.
                      Here is proof of it: https://imgur.com/OG85zmr

                      Comment

                      Working...
                      X