The web page says 16 available.
Announcement
Collapse
No announcement yet.
Benchmarking A 10-Core Tyan/IBM POWER Server For ~$300 USD
Collapse
X
-
Originally posted by illuhad View Post
Great to hear that! Have you also tried OMP_PROC_BIND=spread? This also gave me some significant speedups, as it forces the threads to be pinned and distributed over the cores.
Comment
-
Hi got one of the machines too one of the machines too some points:- 8 Sticks of Samsung 8GB (1Rx4 PC3L-12800R M393B1G70QH0-YK0) work fine
- The newest self compiled PNOR from Git works(after power reset), but with one issue: Apparently it is not compatible to the BMC firmware, the BMC is no longer accessible (Website etc.) IPMI utils show only a very reduced set of sensors (Processor only) -> To those who tried this did you notice the same issue?
- As another user I can not get the 10G network to work, driver seems to be install already (Ubuntu 19.04) but I can not get the interface up any hints?
- Anyone tried overclocking? There is the instruction from Raptor for their machine (Power9): https://wiki.raptorcs.com/wiki/POWER9/Overclocking
- One Note: If you want to reduce power Consumption (and do not need performance right now) you can disable Cores: Idle power consumption of the processor itself drops quite dramatically (50W (10C) -> 15W (2C) ): sudo ppc64_cpu --cores-on=2
Comment
-
Originally posted by musengy View PostHi got one of the machines too one of the machines too some points:- 8 Sticks of Samsung 8GB (1Rx4 PC3L-12800R M393B1G70QH0-YK0) work fine
- The newest self compiled PNOR from Git works(after power reset), but with one issue: Apparently it is not compatible to the BMC firmware, the BMC is no longer accessible (Website etc.) IPMI utils show only a very reduced set of sensors (Processor only) -> To those who tried this did you notice the same issue?
- As another user I can not get the 10G network to work, driver seems to be install already (Ubuntu 19.04) but I can not get the interface up any hints?
- Anyone tried overclocking? There is the instruction from Raptor for their machine (Power9): https://wiki.raptorcs.com/wiki/POWER9/Overclocking
- One Note: If you want to reduce power Consumption (and do not need performance right now) you can disable Cores: Idle power consumption of the processor itself drops quite dramatically (50W (10C) -> 15W (2C) ): sudo ppc64_cpu --cores-on=2
Is there any reason to compile your own PNOR? I just flashed 1.01 and it works fine. Also my BMC came with some strange version of the Megarac+AMI software, I force flashed the latest BMC hpm from tyan and now then fans drop to 6000 rpm on idle, before they were at 11000 at minimum.
Also, has anyone added HDD caddies to this? I'd like to know which ones are compatible.
I found this https://www.ebay.com/itm/Adaptec-ASR...cAAOSwPmVcDkUT, looks like it should work fine for the disk backplane.
Comment
-
Originally posted by niroin00 View PostAlso, has anyone added HDD caddies to this? I'd like to know which ones are compatible.
Originally posted by niroin00 View PostI found this https://www.ebay.com/itm/Adaptec-ASR...cAAOSwPmVcDkUT, looks like it should work fine for the disk backplane.
- Likes 1
Comment
-
I can't get my Habanero to boot – can anyone help me troubleshoot the issue? It booted before, I didn't change anything (as far as I know) and now it doesn't start the hostboot process – the BMC starts fine (as far as I can see), but it doesn't even get to loading the PNOR before stopping.
Also, while trying to fix it, I reflashed the BMC to the newest version and now I can't log in to the primary side BMC ROM, I have to use the golden side. See Part 2 below for that.
Warning, really long post ahead. Sorry! :-)
Part 1, booting issue
Visible symptoms
When I press the power button, the fans start running and ramp up to full speed. On the newest BMC, the fans stop after ~5 seconds, on the original golden-side BMC, the fans continue running. The Fault LED blinks briefly and the power button starts blinking again, as if the system was off. The system displays no other activity apart from the BMC logs – an attached monitor doesn't light up, there is silence on the serial port.
The BMC seems to otherwise run fine, it responds to IPMI commands and the web interface works.
BMC logs
(available only for the golden side BMC, not for the new one)
Event log
Code:145 06/23/2019 09:43:35 Extended SEL Extended SEL OEM timestamped 144 06/23/2019 09:43:24 PSU Fault 2 Power Supply Presence Detected - Asserted 143 06/23/2019 09:43:22 Boot Count OEM Invalid Offset for this SensorType - Asserted 142 06/23/2019 09:37:50 All PGood Power Unit Power Off / Power Down - Asserted
Code:1 Jun 23 09:40:49 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 INFO]http Login from IP:192.168.0.1 user:ADMIN 2 Jun 23 09:40:50 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 INFO]HTTP logout from IP:192.168.0.1 user:ADMIN 3 Jun 23 09:40:55 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 INFO]http Login from IP:192.168.0.1 user:ADMIN 4 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 INFO][LEDACTION] Turning ON Fault LED. VOUT Failure Detected from Power Sequencer. OEM SEL Event will be logged.
Code:1 Jun 23 09:38:06 AMIxxxxxxxxxxxx kernel: [KERNCRITICAL] [/home/davidw/IBM_082415/habanero/Build/.build/bt_hw-2.5.0-ARM-AST-src/data/bthw_mod.c:305]B_BUST bit is already set 2 Jun 23 09:38:32 AMIxxxxxxxxxxxx compmanager: [1416 CRITICAL][compmngrhostboot.c:920]OEM extension loaded successfully#012 3 Jun 23 09:40:50 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 CRITICAL][web_sessions.c:686]Web Session inactivity timeout for Session Id 0!! 4 Jun 23 09:40:50 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 CRITICAL][security.c:441]Session inactive 5 Jun 23 09:40:57 AMIxxxxxxxxxxxx lighttpd[1462]: [1462 CRITICAL][webifc_XportDevice.c:422] g_corefeatures.ipv6_compliance_support==1 6 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMCmdsHelper.c:229]Power ON from Normal side IOCTL Failed 7 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][PDKHW.c:260]Power ON IOCTL Failed 8 Jun 23 09:43:32 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMPowerSequencer.c:451]CRITICAL: BMC did not power off the host, yet CFAM_REFER - GPIOC7 is Low. Investigating now... 9 Jun 23 09:43:32 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMPowerSequencer.c:467]CRITICAL: STATUS_WORD Register indicates VOUT Fault has occured in the system. 10 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 CRITICAL][OEMPowerSequencer.c:515]CRITICAL: VOUT Failure has been CONFIRMED for Rail 7
(Some bootup information omitted.)
Code:76 Jun 23 09:43:22 AMIxxxxxxxxxxxx kernel: Performing the Normal side Full Power ON + Boot Sequence 77 Jun 23 09:43:27 AMIxxxxxxxxxxxx kernel: .....ERROR: PGood is stuck at LOW... 78 Jun 23 09:43:35 AMIxxxxxxxxxxxx kernel: Performing the Power OFF Sequence 79 Jun 23 09:43:35 AMIxxxxxxxxxxxx kernel: Performing the power OFF routine
(Some bootup information omitted.)
Code:91 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Power Button Press Detected : Short Press (1 seconds) 92 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Clearing the OEM SEL Event Buffer... 93 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Event Message Buffer is already Empty... 94 Jun 23 09:43:22 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]POWER ON CHASSIS 95 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]OCC Monitor thread started successfully... 96 Jun 23 09:43:27 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]Power Sequencer thread started.. 97 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Clearing the OEM SEL Event Buffer... 98 Jun 23 09:43:35 AMIxxxxxxxxxxxx IPMIMain: [649 INFO]INFO: Event Message Buffer is already Empty...
What I tried- As someone suggested above, I tried powering the system up using a single PSU – the top one or the bottom one (the logs above were recorded with a single PSU).
- I tried getting rid of unnecessary hardware – the Mellanox LAN card, the SATA controller, the disk backplane with the fans (I plugged other fans into the board headers instead), the RAM (all of it), the CPU (!)… nothing changes the symptoms at all, including the missing CPU.
- Mashing the power button really fast. :-)
- What is this Rail 7? I assume it is a part of the power distribution on the motherboard. It seems this is a hardware problem, but I see no damage to the board. Where is the Rail 7 physically routed, though? Maybe I missed a tiny blown cap or something.
- Or maybe it is a BMC issue? Can it be worked around?
- Is there something I can do to get more information? Setting some jumpers on the board to enable debug info, maybe? The BMC_DEBUG jumpers are already in the DEBUG state, as far as I can tell.
- Is there some extended documentation on this available? The IBM guides are the closest thing I've found, but they all end with “send your board in for exchange”, which is not an option for me.
- Is there maybe a better forum to ask those questions? This one does seem to have many interested people, but it is not a hardware troubleshooting group.
Part 2, I locked myself out of the BMC
As I was trying to resolve the issue, I thought that maybe updating to the newest BMC version would be a good idea. It wasn't. I downladed v2.00 (P0123002.hpm) from the Tyan website and flashed it using IPMI according to this IBM guide – including theCode:raw 0x32 0xba 0x18 0x00
The update went smoothly, but after rebooting the BMC, I can't log in anymore, as the ADMIN/admin credentials don't work. I tried several other user/password combinations, but none seem to match what the system expects. Or perhaps it is just broken? Because of this, I've physically swapped the primary and golden sides of the BMC ROM to use the old golden side instead of the new primary one. But I need to reflash the primary ROM again, which is not easily possible without logging into it.
So my question is: Does anyone know- whether there is a way of booting the golden BMC without swapping them, or
- whether it is possible to flash the “golden” side while leaving the “primary” one (… so I can reflash the broken new version by physically swapping the chips, booting the golden one as if it was primary and flashing the primary one as if it was golden)?
- Or, alternatively, what the correct credentials are?
Comment
Comment