Announcement

Collapse
No announcement yet.

Linux 5.15 Addressing Scalability Issue That Caused Huge IBM Servers 30+ Minutes To Boot

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by BillT View Post
    Many years ago when I worked on Tandem gear, the OS was a proprietary system called Guardian. The programming was done using TAL = Tandem Application Language.
    Interesting. I found information at http://nonstoptools.com/ Which looks like it was designed in 1994 and doesn't support SSL, haha, but tons of info in PDF.

    Comment


    • #22
      Originally posted by pipe13 View Post
      First show me another vendor with 5 nines availability.
      I've heard nine fives availability is better.

      Comment


      • #23
        Originally posted by partcyborg View Post

        Lol 5 nines is ~5min of downtime per YEAR. Good luck getting that with a 30min boot time
        ~5 min unplanned downtime per year. Planned outage is not counted in availability.

        Comment


        • #24
          I'm curious: What is the point of systems like this? Why not go for a cluster of cheaper servers? My (limited) understanding is that that is what companies like Google, Facebook, Amazon etc do.

          Unless you really need 64 TB ram on a single CPU, why not split it across multiple machines and design around that? After all, those companies I mentioned above also manage extremely impressive uptimes.

          Comment


          • #25
            Originally posted by Vorpal View Post
            I'm curious: What is the point of systems like this? Why not go for a cluster of cheaper servers?
            Depends on your problem. Sometimes you need low latency random access on 64 TB of data...

            And IMO distributed systems are often overrated/overused. A cluster is hard to develop for and hard to manage. And when things go wrong, hard to find bottlenecks, hard to fix then, hard to recover from failures. Also, when network latency kicks in or the cluster software isn't done 100% right, some problems might need a LOT of machines in a cluster to achieve satisfactory performance. And then depending on the cluster size, it might end up more expensive than a single machine. Besides, if you factor in the development/maintenance cost for a cluster vs 1 machines (say with another hot spare), you might end up with a total cost that is significantly more expensive.

            My advice- if you don't need massive horizontal scalability, and if you don't need massive reliability- don't develop distributed systems.

            Comment


            • #26
              Originally posted by indepe View Post
              I'd guess being able to boot a mainframe with "several hundred CPUs and 64TB of RAM" in under 5 minutes is quite an achievement, though. (Without knowing how long other OS's would take...)
              I am pretty sure DOS would boot in an instant. Wait, you didn't say you wanted to use more than 640K... ;-)
              Originally posted by pipe13
              First show me another vendor with 5 nines availability.
              Not too hard; today, the bigger problem is the software which does not get close to that.

              Comment


              • #27
                Originally posted by coder111 View Post

                Depends on your problem. Sometimes you need low latency random access on 64 TB of data...

                And IMO distributed systems are often overrated/overused. A cluster is hard to develop for and hard to manage. And when things go wrong, hard to find bottlenecks, hard to fix then, hard to recover from failures. Also, when network latency kicks in or the cluster software isn't done 100% right, some problems might need a LOT of machines in a cluster to achieve satisfactory performance. And then depending on the cluster size, it might end up more expensive than a single machine. Besides, if you factor in the development/maintenance cost for a cluster vs 1 machines (say with another hot spare), you might end up with a total cost that is significantly more expensive.

                My advice- if you don't need massive horizontal scalability, and if you don't need massive reliability- don't develop distributed systems.
                A couple of counter points.
                1. There are other types of distributed systems, not just for data centres. I have worked on development for distributed systems in embedded applications. One example is the CAN bus in a modern car with many attached microcontrollers. Another case I also worked on was multiple communicating robots. Both of these have very different requirements than the data centre case, but are considered distributed systems. I suspect that you were considering a much narrower scope, but correct definitions and terminology is important.
                2. Even horizontal scaling is relatively easy if done well, with an environment built for that. Consider Erlang, which I have used in a horizontal scaling situation. Thanks to the functional message passing programming paradigm and the OTP libraries, horizontal scaling comes almost for free. Reliability (while not quite as easy) is also greatly simplified.
                That said, I never had to deal with purchasing the hardware in question, so I will have to take your word for it when it comes to the cost. And in the embedded setting there are really good reasons to work on a distributed system (i.e. you need multiple physical vehicles/devices, and you want to save on wiring within a vehicle).

                Comment


                • #28
                  Originally posted by MadeUpName View Post
                  IBM isn't the only company in the world that makes big ass severs. Until you show me another vendor with similar problems I blame IBM.
                  A fix got accepted into the kernel, which means maintainers considered there actually was something wrong with it, so I guess they already accepted Linux is not perfect. As no other OS is either. So you're wrong.

                  Comment


                  • #29
                    Originally posted by sinepgib View Post

                    A fix got accepted into the kernel, which means maintainers considered there actually was something wrong with it, so I guess they already accepted Linux is not perfect. As no other OS is either. So you're wrong.
                    Software of all flavours is full of kludges put in place to get around vendor F'ups. Look at every thing from firmware to bios's to to X.

                    Comment


                    • #30
                      What a fun conversation. I haven't seen mainframe dick slapping in years.

                      Comment

                      Working...
                      X