Announcement

Collapse
No announcement yet.

Benchmarking A 10-Core Tyan/IBM POWER Server For ~$300 USD

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    The web page says 16 available.

    Comment


    • #92
      Originally posted by illuhad View Post

      Great to hear that! Have you also tried OMP_PROC_BIND=spread? This also gave me some significant speedups, as it forces the threads to be pinned and distributed over the cores.
      IIRC this also helps, but not sure now how much. But certainly helps too! Thanks, Karel

      Comment


      • #93
        Hi got one of the machines too one of the machines too some points:
        • 8 Sticks of Samsung 8GB (1Rx4 PC3L-12800R M393B1G70QH0-YK0) work fine
        • The newest self compiled PNOR from Git works(after power reset), but with one issue: Apparently it is not compatible to the BMC firmware, the BMC is no longer accessible (Website etc.) IPMI utils show only a very reduced set of sensors (Processor only) -> To those who tried this did you notice the same issue?
        • As another user I can not get the 10G network to work, driver seems to be install already (Ubuntu 19.04) but I can not get the interface up any hints?
        • Anyone tried overclocking? There is the instruction from Raptor for their machine (Power9): https://wiki.raptorcs.com/wiki/POWER9/Overclocking
        • One Note: If you want to reduce power Consumption (and do not need performance right now) you can disable Cores: Idle power consumption of the processor itself drops quite dramatically (50W (10C) -> 15W (2C) ): sudo ppc64_cpu --cores-on=2

        Comment


        • #94
          Hi,

          if somebody bought this server and finds it unusable, I would like to buy it. Please contact me.

          Comment


          • #95
            Managed to snag 1, I wanted 2 but they were already sold out.
            So best performance is with 16 sticks installed in the blue and white slots, right?

            Comment


            • #96
              Originally posted by musengy View Post
              Hi got one of the machines too one of the machines too some points:
              • 8 Sticks of Samsung 8GB (1Rx4 PC3L-12800R M393B1G70QH0-YK0) work fine
              • The newest self compiled PNOR from Git works(after power reset), but with one issue: Apparently it is not compatible to the BMC firmware, the BMC is no longer accessible (Website etc.) IPMI utils show only a very reduced set of sensors (Processor only) -> To those who tried this did you notice the same issue?
              • As another user I can not get the 10G network to work, driver seems to be install already (Ubuntu 19.04) but I can not get the interface up any hints?
              • Anyone tried overclocking? There is the instruction from Raptor for their machine (Power9): https://wiki.raptorcs.com/wiki/POWER9/Overclocking
              • One Note: If you want to reduce power Consumption (and do not need performance right now) you can disable Cores: Idle power consumption of the processor itself drops quite dramatically (50W (10C) -> 15W (2C) ): sudo ppc64_cpu --cores-on=2
              I installed CentOS 7 and it recognized the card and enabled DHCP automatically, I received an 8 core model with a Mellanox 10G Ethernet card.
              Is there any reason to compile your own PNOR? I just flashed 1.01 and it works fine. Also my BMC came with some strange version of the Megarac+AMI software, I force flashed the latest BMC hpm from tyan and now then fans drop to 6000 rpm on idle, before they were at 11000 at minimum.

              Also, has anyone added HDD caddies to this? I'd like to know which ones are compatible.
              I found this https://www.ebay.com/itm/Adaptec-ASR...cAAOSwPmVcDkUT, looks like it should work fine for the disk backplane.

              Comment


              • #97
                The main reason to compile a new PNOR would be to boot from something the older one does not support (maybe some new m.2 model? etc). Otherwise I don't think the 1.01 version has any critical bugs.

                Comment


                • #98
                  Originally posted by niroin00 View Post
                  Also, has anyone added HDD caddies to this? I'd like to know which ones are compatible.
                  I couldn't find the Tyan part numbers anywhere, but the equivalent IBM server (S812LC) uses front drive carriers part number 01AF246. They should be the same, apart from the front-side plastic levers.

                  Originally posted by niroin00 View Post
                  I found this https://www.ebay.com/itm/Adaptec-ASR...cAAOSwPmVcDkUT, looks like it should work fine for the disk backplane.
                  It should – unless you want to boot from it, as the Tyan-made servers may not have firmware support. The original part from Tyan is Tyan Storage Mezzanine P3260-9235-12I SATA 6G Marvell 88SE9235. The original part from IBM (which possibly requires different firmware) is either PMC Adaptec RAID 71605E, part number 00WV552, also called FC EC3Y, or (with battery-backed cache) PMC Adaptec RAID 81605Z, part number 00WV554, also called FC EC3S.

                  Comment


                  • #99
                    I can't get my Habanero to boot – can anyone help me troubleshoot the issue? It booted before, I didn't change anything (as far as I know) and now it doesn't start the hostboot process – the BMC starts fine (as far as I can see), but it doesn't even get to loading the PNOR before stopping.

                    Also, while trying to fix it, I reflashed the BMC to the newest version and now I can't log in to the primary side BMC ROM, I have to use the golden side. See Part 2 below for that.

                    Warning, really long post ahead. Sorry! :-)


                    Part 1, booting issue

                    Visible symptoms

                    When I press the power button, the fans start running and ramp up to full speed. On the newest BMC, the fans stop after ~5 seconds, on the original golden-side BMC, the fans continue running. The Fault LED blinks briefly and the power button starts blinking again, as if the system was off. The system displays no other activity apart from the BMC logs – an attached monitor doesn't light up, there is silence on the serial port.

                    The BMC seems to otherwise run fine, it responds to IPMI commands and the web interface works.


                    BMC logs

                    (available only for the golden side BMC, not for the new one)

                    Event log
                    Code:
                    145 06/23/2019 09:43:35 Extended SEL Extended SEL OEM timestamped
                    144 06/23/2019 09:43:24 PSU Fault 2 Power Supply Presence Detected - Asserted
                    143 06/23/2019 09:43:22 Boot Count OEM Invalid Offset for this SensorType - Asserted
                    142 06/23/2019 09:37:50 All PGood Power Unit Power Off / Power Down - Asserted
                    Audit log
                    Code:
                    1 Jun 23 09:40:49 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 INFO]http Login from IP:192.168.0.1 user:ADMIN
                    2 Jun 23 09:40:50 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 INFO]HTTP logout from IP:192.168.0.1 user:ADMIN
                    3 Jun 23 09:40:55 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 INFO]http Login from IP:192.168.0.1 user:ADMIN
                    4 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 INFO][LEDACTION] Turning ON Fault LED. VOUT Failure Detected from Power Sequencer. OEM SEL Event will be logged.
                    System log – Critical
                    Code:
                    1 Jun 23 09:38:06 AMIxxxxxxxxxxxx kernel: [KERNCRITICAL] [/home/davidw/IBM_082415/habanero/Build/.build/bt_hw-2.5.0-ARM-AST-src/data/bthw_mod.c:305]B_BUST bit is already set
                    2 Jun 23 09:38:32 AMIxxxxxxxxxxxx compmanager: [1416 CRITICAL][compmngrhostboot.c:920]OEM extension loaded successfully#012
                    3 Jun 23 09:40:50 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 CRITICAL][web_sessions.c:686]Web Session inactivity timeout for Session Id 0!!
                    4 Jun 23 09:40:50 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 CRITICAL][security.c:441]Session inactive
                    5 Jun 23 09:40:57 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 CRITICAL][webifc_XportDevice.c:422] g_corefeatures.ipv6_compliance_support==1
                    6 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMCmdsHelper.c:229]Power ON from Normal side IOCTL Failed
                    7 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][PDKHW.c:260]Power ON IOCTL Failed
                    8 Jun 23 09:43:32 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMPowerSequencer.c:451]CRITICAL: BMC did not power off the host, yet CFAM_REFER - GPIOC7 is Low. Investigating now...
                    9 Jun 23 09:43:32 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMPowerSequencer.c:467]CRITICAL: STATUS_WORD Register indicates VOUT Fault has occured in the system.
                    10 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMPowerSequencer.c:515]CRITICAL: VOUT Failure has been CONFIRMED for Rail 7
                    System log – Warning
                    (Some bootup information omitted.)
                    Code:
                    76 Jun 23 09:43:22 AMIxxxxxxxxxxxx kernel: Performing the Normal side Full Power ON + Boot Sequence
                    77 Jun 23 09:43:27 AMIxxxxxxxxxxxx kernel: .....ERROR: PGood is stuck at LOW...
                    78 Jun 23 09:43:35 AMIxxxxxxxxxxxx kernel: Performing the Power OFF Sequence
                    79 Jun 23 09:43:35 AMIxxxxxxxxxxxx kernel: Performing the power OFF routine
                    System log – Information
                    (Some bootup information omitted.)
                    Code:
                    91 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Power Button Press Detected : Short Press (1 seconds)
                    92 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Clearing the OEM SEL Event Buffer...
                    93 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Event Message Buffer is already Empty...
                    94 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]POWER ON CHASSIS
                    95 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]OCC Monitor thread started successfully...
                    96 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]Power Sequencer thread started..
                    97 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Clearing the OEM SEL Event Buffer...
                    98 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Event Message Buffer is already Empty...
                    Error and Emergency sections of the system log are empty, Alert, Notification and Debug don't seem to contain anything useful.


                    What I tried
                    • As someone suggested above, I tried powering the system up using a single PSU – the top one or the bottom one (the logs above were recorded with a single PSU).
                    • I tried getting rid of unnecessary hardware – the Mellanox LAN card, the SATA controller, the disk backplane with the fans (I plugged other fans into the board headers instead), the RAM (all of it), the CPU (!)… nothing changes the symptoms at all, including the missing CPU.
                    • Mashing the power button really fast. :-)
                    My questions are:
                    1. What is this Rail 7? I assume it is a part of the power distribution on the motherboard. It seems this is a hardware problem, but I see no damage to the board. Where is the Rail 7 physically routed, though? Maybe I missed a tiny blown cap or something.
                    2. Or maybe it is a BMC issue? Can it be worked around?
                    3. Is there something I can do to get more information? Setting some jumpers on the board to enable debug info, maybe? The BMC_DEBUG jumpers are already in the DEBUG state, as far as I can tell.
                    4. Is there some extended documentation on this available? The IBM guides are the closest thing I've found, but they all end with “send your board in for exchange”, which is not an option for me.
                    5. Is there maybe a better forum to ask those questions? This one does seem to have many interested people, but it is not a hardware troubleshooting group.


                    Part 2, I locked myself out of the BMC

                    As I was trying to resolve the issue, I thought that maybe updating to the newest BMC version would be a good idea. It wasn't. I downladed v2.00 (P0123002.hpm) from the Tyan website and flashed it using IPMI according to this IBM guide – including the
                    Code:
                    raw 0x32 0xba 0x18 0x00
                    command, which may be the cause of the problem.

                    The update went smoothly, but after rebooting the BMC, I can't log in anymore, as the ADMIN/admin credentials don't work. I tried several other user/password combinations, but none seem to match what the system expects. Or perhaps it is just broken? Because of this, I've physically swapped the primary and golden sides of the BMC ROM to use the old golden side instead of the new primary one. But I need to reflash the primary ROM again, which is not easily possible without logging into it.

                    So my question is: Does anyone know
                    1. whether there is a way of booting the golden BMC without swapping them, or
                    2. whether it is possible to flash the “golden” side while leaving the “primary” one (… so I can reflash the broken new version by physically swapping the chips, booting the golden one as if it was primary and flashing the primary one as if it was golden)?
                    3. Or, alternatively, what the correct credentials are?

                    Comment


                    • The tyan BMC firmware uses root:superuser as default userass. After flashing reset settings in the web interface.

                      Comment

                      Working...
                      X