Announcement

**chuckula** · 01 September 2021, 06:18 PM

Making matters worse is that with the 64TB of system memory and IBM Power dividing them into 256MB local blocks exposed via sysfs, a heck of a lot of sysfs nodes are being created.

256,000 [edit: or 262,144 FOR YOU NERDS] sysfs nodes oughta be enough for anybody.

But seriously, why do RAM chunks even need to be referenced via sysfs in the first place?

**tuxd3v** · 01 September 2021, 06:38 PM

Only in shared pools we used to have systems with more than 1000 cores, with power6 and power7 CPUs.
AIX shines there.. of-course also tons and tons of memory

**Space Heater** · 01 September 2021, 06:43 PM

Originally posted by chuckula View Post

But seriously, why do RAM chunks even need to be referenced via sysfs in the first place?

It's probably to expose RAS features for main memory like error reporting and hotplug.

**indepe** · 01 September 2021, 06:45 PM

I'd guess being able to boot a mainframe with "several hundred CPUs and 64TB of RAM" in under 5 minutes is quite an achievement, though. (Without knowing how long other OS's would take...)

**linner** · 01 September 2021, 07:12 PM

Yet another problem helped with semaphores. Something the C standard idiots eschewed and only gave us awful condition variables which are not the same thing as semaphores. I guess everybody is still using pthreads anyway because the standard is so lacking in features. I'm not sure why they bothered to put anything thread related in there.

**MadeUpName** · 01 September 2021, 08:06 PM

IBM isn't the only company in the world that makes big ass severs. Until you show me another vendor with similar problems I blame IBM.

**pipe13** · 01 September 2021, 10:34 PM

Originally posted by MadeUpName View Post

IBM isn't the only company in the world that makes big ass severs. Until you show me another vendor with similar problems I blame IBM.

First show me another vendor with 5 nines availability.

**stormcrow** · 01 September 2021, 11:07 PM

Originally posted by pipe13 View Post

First show me another vendor with 5 nines availability.

At this point I'd say any vendor claiming such a thing has to prove it to me. Many of them, including IBM, have been caught exaggerating reliability figures in the past.

I figure the subdivision is about isolating hardware segments for fine grained error reporting. Long lived processes will eventually get tripped up by hardware errors even if the software itself isn't to blame. The error reporting may allow the software processes to self-heal around the errors easier. This doesn't necessarily lead to extremes of reliability in all cases, however.

**torsionbar28** · 01 September 2021, 11:10 PM

Originally posted by pipe13 View Post

First show me another vendor with 5 nines availability.

Try the Tandem/Compaq/HPE "NonStop" servers. The NSK (NonStop Kernel) based servers have been doing 5 nines for a few decades now. At least a few years ago when I worked for HPE, the NonStop servers powered 14 of the world's stock exchanges, and ran 80% of the power generation in the US, including 100% of the nuclear power plants. Every component in the box is quad-redundant and hot swappable, including CPU's, RAM, backplanes, everything. They even use special hard drives formatted to 520 bytes (IIRC) per sector instead of 512, because every sector was check-summed at the hardware level. You want CPU's? These are scalable to 4080 CPU's.

Edit: Interestingly, AOL bought quite a few of these machines back in the late 90's... not sure what AOL was doing with them, I only serviced them. Super reliable dial-up internet, lol?

Announcement

Linux 5.15 Addressing Scalability Issue That Caused Huge IBM Servers 30+ Minutes To Boot

Linux 5.15 Addressing Scalability Issue That Caused Huge IBM Servers 30+ Minutes To Boot

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment