Announcement

Collapse
No announcement yet.

Linux vs Solaris - scalability, etc

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Who cares what your clarifying post says, you're clearly not someone that should be taken seriously.

    Comment


    • #42
      Originally posted by kebabbert View Post
      And, as we know, XFS is not safe.
      XFS is about as safe as other mainstream Linux filesystems, with the exception of btrfs. Thus your point is irrelevant, since the topic is scalability.

      Yes, ZFS/BTRFS are (or will be, once they are deemed production-quality) awesome if you want to provide reliable storage using cheap SATA drives. High end storage arrays often provide scrubbing and checksumming at the array level, so you can certainly get that kind of safety without baking it into the filesystem. Anyway, that's a totally different discussion so I won't go further into that here.

      The point is, Linux kernel devs does not have access to bigger SMP servers.
      That point is also completely wrong, because those developers that are working on some aspect of scalability certainly do have access to big machines (and, like I've said many times before, big machines are nowadays invariably NUMA, not SMP!!). Your obsession with some webcam driver developer not caring about scalability is irrelevant and ridiculous. Not everybody needs to care about scalability.

      Yes you are correct that Windows is a common server OS. But I was talking about big servers.
      Then you should have said so in your claim.

      I am saying that Linux developers dont have access to big SMP servers, because there are no such servers on the market. I am not saying that Linux developers choose to not work on scalability - the reason is they cant. Not that they want.
      Which, as I said above, is wrong. HW is available (except for SGI, you probably can't get a support contract for running Linux on it, but for kernel development that is of course irrelevant). For instance, SGI developers working on aspects of scalability that affect their customers certainly have access to their own HW (duh..). The recent VFS scalability work was tested on some big (at least 128-way) POWER7 machine. And so on.

      Ted Tso agrees with me on this. As we saw, he said most Linux devs did not have acess to 32 cores, which are exotic hardware.
      Most != all, which is the entire point. Generally, only a small set of core developers that are developing the core data structures and algorithms need to worry about scalability (and they certainly do care!). J. Random Developer who is developing a driver for an USB connected coffee brewer has no reason to give a shit about scalability.

      Do you think that it is strange of me, to say that companies should earn gazillions more money, by switching to a more lucrative market?
      I think such statements show your naivete more than anything else.

      You claim that Altix HPC servers can do SMP very easy
      I have never claimed that. I have repeatedly said that, like all other current large shared memory machines, Altix machines are CC-NUMA.

      , why dont SGI slap on stock linux and earn a few billions?
      Because SGI is a company which is focused on other markets, and changing that focus would be a very disruptive and risky undertaking. Similar to why Ford is not switching to cocaine production.

      The high end Enterprise high margin market is not vulnerable. The reason Sun went down the drain, is because Sun lost that market! If Sun still had those big contracts and earned a billions of USD, do you really think that Sun would go down the drain?
      I think Sun went down largely because Linux/Windows/x86 destroyed their workstation and most of their server market, and the remaining server market share wasn't enough to sustain them; as they couldn't invest enough into SPARC R&D their market share in the high end also dropped, and thus they were finished. OTOH, HP (via Intel) and IBM are not so vulnerable as they do a lot of chip development and fabbing for other markets.

      But still, RedHat has not gone for the big bucks SMP market. Why?
      Because they get better ROI in other markets.

      Well, if you believe company boards would not want a piece of a highly lucrative billion dollar market, if you dont believe that companies dont want to become the next IBM or Oracle or Apple or Google or... then maybe it is you should go back to school?
      So if it's so easy, why don't we have a zillion googles around then?

      To answer that, in case it isn't bloody obvious, market entry into an existing market is VERY difficult. Wrt entering the high end DB server market, Linux technical capabilities, or lack thereof, are an extremely small piece of the puzzle.

      Maybe it is you that misunderstand me. I am claiming that if Windows could, windows would love to eat into high end too, and earn billions of dollars. Same with Linux. But this does not happen. Why not? Because "they dont want to"? Or is it because they can not, because of technical reasons?
      But it does happen, all the time! The server space were Linux and Windows are not yet playing (be it due to lack of technical capabilities or any number of other factors) is squeezed smaller and smaller all the time. I see no reason why this development would have suddenly stopped.

      I am trying to explain that if companies are afraid of vendor lock-in, they can migrate from Solaris Oracle database to Linux Oracle database. The same on IBM DB2 database. You can migrate from AIX DB2 to Linux DB2. It is quite easy to do, if you stay with the same database. Thus, there is no reason to be afraid of OS vendor lock-in. There is a Database vendor lock-in.

      But in this discussion, that is not relevant. Because we are discussing OSes, not databases. And you can freely switch between OSes, as long as you stay with the same database.
      You cannot get a support contract for running Linux/DB2 on some monster POWER box, same for Oracle/SPARC. So effectively a Linux DB server from a tier-1 vendor is limited to x86. So whatever Linux scalability for some specific DB workload is, a customer cannot practically run it on the biggest HW his tier-1 vendor of choice offers.

      So what explanation is reasonable you think? If Linux can do big SMP easily (as you claim) why dont Linux companies do it? What is your explanation?
      See my answer below in my previous message.

      This could be a possible explanation. But for instance, RedHat is not small. RedHat has 1000s of employees and much of the important software is open sourced. RedHat has the capacity to do this, but they dont. Why?
      I don't think RedHat has the capacity for this. RH is a software company, they don't do HW. And I don't think RH developing their own HW would be a success either; RH success has come from riding the commodization wave. That is, replacing proprietary Unix servers with RHEL and COTS x86.

      Look, if Linux can do everything that mature Unix can do, why dont Linux do it?
      I don't think it can do everything. I think it can do most things better and cheaper, which is why it has destroyed most of the markets for traditional Unix systems.

      And it's eating away at the remaining niches.

      Comment


      • #43
        Originally posted by cldfzn View Post
        Who cares what your clarifying post says, you're clearly not someone that should be taken seriously.
        As I said before, I write lot of text trying to adress 5 persons at the same time. It is bound to happen that some things I write, gets fuzzy and needs a clarification. I am not not one of those, who can write crystal clear text with no ambiguities. I believe that if also you, wrote pages of text, some of your phrasings might be unclear?

        Comment


        • #44
          Originally posted by kebabbert View Post
          Ok, this interesting. You claim that Linux has bad I/O just as someone here, earlier claimed? Maybe that could be the reason?

          Here is a storage expert that also says that Linux has bad I/O and does not cut it
          I am frequently asked by potential customers with high I/O requirements if they can use Linux instead of AIX or Solaris.No one ever asks me about

          For this, he was flamed. Of course.

          Here is the follow up article, answering to the Linux flame:
          My article three weeks ago on Linux file systems set off a firestorm unlike any other I've written in the decade I've been writing on storage and



          Here is another link, he tries to see if Linux can handle big filesystems, but no Linux vendor wants him to do this benchamrk. Why? Do the vendors believe that Linux has bad I/O?
          We failed. Jeff and I tried and tried to get the hardware to run the file system tests that we had written about. Originally, Jeff had all of the hardware



          And that is the reason there are no big SMP Linux servers on the market? Because of bad I/O scaling? Does any one have more info on this?
          Learn to read and actualy understand what you read.

          I wrote about my experience with HP-UX !!! I so have similar experience with Linux systems, but I simply generalised that no OS scales well on IO limited workloads since the limits are EXTERNAL TO THE OS !!!!

          It seems that any argument with you is irelevant, since you don't want to even understand it, not to say infer the correct conclusions.

          QED

          Comment


          • #45
            Originally posted by haplo602 View Post
            ...but I simply generalised that no OS scales well on IO limited workloads since the limits are EXTERNAL TO THE OS !!!!

            It seems that any argument with you is irelevant, since you don't want to even understand it, not to say infer the correct conclusions.

            QED
            I do want to understand you, but actually, I think you could be a bit clearer. How is a limit "internal" or "external" to the OS?

            Is I/O external to the OS? What do you mean? Well, the I/O consists of disks and controller cards, which are external components. The OS just use the I/O, yes. Just as the OS are using the cpu, and use RAM.

            In other words, in a similar reasoning, would you also claim that
            "no OS scales well on cpu limited workloads since the limits are EXTERNAL TO THE OS!!!" ?
            Or, talk about RAM as an external component? I dont get it.




            Regarding all OSes has I/O problems. Well, as I shown links from large scale storage experts, it seems that Linux has I/O problems. He says that he got flamed for this, but then he asked the Linux flamers:
            If you disagree, try it yourself. Go mkfs a 500 TB ext-3/4 or other Linux file system, fill it up with multiple streams of data, add/remove files for a few months with, say, 20 GB/sec of bandwidth from a single large SMP server and crash the system and fsck it and tell me how long it takes. Does the I/O performance stay consistent during that few months of adding and removing files? Does the file system perform well with 1 million files in a single directory and 100 million files in the file system?
            The Linux users are most probably used to 20-30TB filesystems and believes that Linux also behaves well on large scale. Which is apparantely does not. Just because Linux behaves fast on a few cpus and few disks, does not mean it scales well up to 100 cpus and 500 disks.

            I dont know about HP-UX but it seems that development of HP-UX is not really prioritised. I dont see much innovation from HP-UX of lately. Earlier, it was innovative. But today? Not much. I dont know how good HP-UX is at I/O?

            Regarding Solaris, it has had it's network stack completely redesigned in Solaris 10 and S10. The reason the name "Slowlaris" was used earlier, was because old Solaris 8 and 9 had a slow network stack. But the stack has been completely rewritten now, and is extremely fast in S11. In early benchmarks S10 with an early version of the new network stack, Solaris had 20-30% higher performance than RedHat Linux - on the same hardware.

            Here is a stock exchange that says they got 36% better performance, on the same hardware by switching from Linux to Solaris.

            Bill Morgan, CIO at Philadelphia Stock Exchange Inc., has a new electronic options-trading system that runs on Solaris 10, which he said has improved trading capacity by 36%. Like Greenwade, he credited performance improvements to the TCP/IP stack, as well as improved multithreading support and a new feature called DTrace (for Dynamic Tracing) that tunes application performance.
            Of course this is a few years old, and Linux has improved since then.

            But, Solaris has also improved. The step from Solaris 10 to S11 is huge, the improvements are huge.

            Comment


            • #46
              Originally posted by jabl View Post
              XFS is about as safe as other mainstream Linux filesystems, with the exception of btrfs. Thus your point is irrelevant, since the topic is scalability
              Not only scalability. "Linux vs Solaris - scalability, etc"



              Yes, ZFS/BTRFS are (or will be, once they are deemed production-quality) awesome if you want to provide reliable storage using cheap SATA drives. High end storage arrays often provide scrubbing and checksumming at the array level, so you can certainly get that kind of safety without baking it into the filesystem. Anyway, that's a totally different discussion so I won't go further into that here.
              Well, CERN says that even some highend storage systems are not designed to catch data corruption. I am not convinced all highend storage systems are safe, if we are going to trust CERN.



              That point is also completely wrong, because those developers that are working on some aspect of scalability certainly do have access to big machines
              Well, according to Linux devs, for instance, Ted Tso, you are wrong. The Linux devs do not have access to big SMP systems. Maybe because they dont exist. HP sells a Superdome Itanium 8 cpu Big Tux server, but other than that, there are no big SMP Linux on the market.


              (and, like I've said many times before, big machines are nowadays invariably NUMA, not SMP!!).
              And, as I have said, the big Linux 4.096 core HPC systems have terrible latency, and it might look like a shared memory, but it isnt:

              "I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency."



              Then you should have said so in your claim.
              When I discuss scalability, I do not talk about 2-4 cpus, just as Linux users are used to. I talk about dozens of cpus, 64 cpus, or even 144 cpus. I thought it was clear from the "scalability" context and "big SMP servers" that we were discussing big servers.




              Which, as I said above, is wrong. HW is available (except for SGI, you probably can't get a support contract for running Linux on it, but for kernel development that is of course irrelevant). For instance, SGI developers working on aspects of scalability that affect their customers certainly have access to their own HW (duh..). The recent VFS scalability work was tested on some big (at least 128-way) POWER7 machine. And so on.
              Ok, so you claim that there are big SMP Linux servers on the market. Fine. Please show me links then. I am not talking about a prototype as Big Tux HP Superdome or experimental POWER7, I am talking about SMP servers for sale, with supported 64 cpus or so. Where are those links?



              Most != all, which is the entire point. Generally, only a small set of core developers that are developing the core data structures and algorithms need to worry about scalability (and they certainly do care!). J. Random Developer who is developing a driver for an USB connected coffee brewer has no reason to give a shit about scalability.
              Agreed that not every developer is doing scalability.

              But which Linux devs has access to those big SMP servers that are for sale? Where are those SMP servers, by the way? Who sell them?



              Because SGI is a company which is focused on other markets, and changing that focus would be a very disruptive and risky undertaking. Similar to why Ford is not switching to cocaine production.
              Again. Your example can be improved.

              You and everyone else, claim: that Linux does SMP as easily as HPC. And the HPC Altix servers, are in fact SMP servers. Fine. Does you claim hold water?

              If your claim is true, why dont SGI just sell them as SMP servers? SGI does not need to change anything, no recompilation of Linux, no hardware change, no nothing, just sell them as SMP servers.

              Regarding your Ford example, switching to drugs because drugs are much more higher margin and lucrative. Well, Ford does not know how to do drugs. It would be a big investment and building new factories, etc. But in the case Linux, nothing needs to be changed at all, because according to you: there are big SMP servers on the market: SGI.

              A better Ford example would be: why dont Ford sell cars painted as the USA flag? Stripes and stars? It is easy for Ford to do. And it also turns out that USA flag cars, is a billion USD dollar market, and those companies earns gazillions of USD with 100.000s of employees. Why dont Ford sell USA flag painted cars? Apparently, Ford knows how to do cars, and to paint them in other colors is not difficult.

              SMP and HPC is the same thing, according to you, and to sell SMP or HPC does not matter, it is the same server.




              Other similar stuff
              My point is, if SMP and HPC is the same thing, then Linux companies will snatch the SMP market where the big bucks are, and earn millions and become the next IBM or Oracle or Apple or Google or... and become billionaries. But no, no Linux company does this. Why? One IBM server with 32 cpus, costed 35 million USD list price. Why dont SGI Altix sell their 4096core server with 50x performance for a fraction of the price?

              Is it because there is a big difference in SMP and HPC, you can not sell a HPC system and say it is SMP? HPC can not do SMP work?
              Or is it because Linux has I/O problems doing SMP work?
              Or is it because Linux companies dont want to earn billions of USD, as that market is "not lucrative"?

              I have heard lots of different explanations. There is something called Occams razor. What does it say?

              Comment


              • #47
                Originally posted by kebabbert View Post
                I do want to understand you, but actually, I think you could be a bit clearer. How is a limit "internal" or "external" to the OS?

                Is I/O external to the OS? What do you mean? Well, the I/O consists of disks and controller cards, which are external components. The OS just use the I/O, yes. Just as the OS are using the cpu, and use RAM.

                In other words, in a similar reasoning, would you also claim that
                "no OS scales well on cpu limited workloads since the limits are EXTERNAL TO THE OS!!!" ?
                Or, talk about RAM as an external component? I dont get it.
                It seems you have no clue about basic concepts.

                External means that the OS has no influence over the bottleneck. I.e if the OS were 100% efficient at the given task/workload it would still show up as slow since the real bottleneck is not OS efficiency. DO YOU GET THE POINT ???

                In case of IO operations, the only reasonable test for scaling is in-memory filesystems. Otherwise you are always dealing with external elements that you have to account for (controller, interconnect, disk backend, firmware bugs at the various stages of the chain etc.). Yes some OSes are more efficient than others, but the real differences will be a few % at best. Otherwise the implementation is just bad.

                Regarding HP-UX: I have no clue. Inovation is not a priority in enterprise. Stability and performance is. Oh and accountability. If you cannot track issues properly, you have a big problem. That's the one major point Linux lacks. Everything else is mostly there.

                btw your initial point was scaling in general and now you are jumping between IO/fs/network ... what's the point of discussion again ?

                Comment


                • #48
                  Originally posted by haplo602 View Post
                  It seems you have no clue about basic concepts.

                  External means that the OS has no influence over the bottleneck. I.e if the OS were 100% efficient at the given task/workload it would still show up as slow since the real bottleneck is not OS efficiency. DO YOU GET THE POINT ???
                  No, actually I still dont understand.

                  Let us instead apply your claim on "CPUs" instead. I mean, the OS has no direct influence over the cpu. If the CPU needs to do work for 3 persons, the OS does not choose to idle instead. The OS will use the CPU as good as the OS can do. It does not matter if the cpu is slow or fast, the OS will use it as efficient as possible. Same with I/O.

                  This gives: "If the OS were 100% efficient at the given CPU workload it would still show up as slow since the real bottleneck is not OS efficiency"

                  Could you try to explain this a bit more? I still dont get it?

                  Comment


                  • #49
                    Originally posted by kebabbert View Post
                    Well, CERN says that even some highend storage systems are not designed to catch data corruption.
                    I believe that, but in no way does that refute what I said, since I didn't claim that ALL highend storage systems have features to catch data corruption. Merely that they OFTEN have such features, which is factually true.

                    Well, according to Linux devs, for instance, Ted Tso, you are wrong.
                    No, you're just quote-mining stuff and taking statements well beyond their original context. You really think SGI developers don't have access to their own hardware, for instance?

                    And, as I have said, the big Linux 4.096 core HPC systems have terrible latency
                    Of course the worst case remote memory latency on a big CC-NUMA will be much higher than for local memory. That's the entire point. The alternative, after all, is not SMP but rather USMA (uniformly slow memory access). And yes, in order to run efficiently on a big CC-NUMA machine you need an OS kernel that has been designed with this in mind (such as Linux) as well as applications that are very careful wrt non-local memory accesses. The latter point ruling out most applications including RDBMS'es which is why you don't see 4096-way DB servers anywhere.

                    Of course, an OS that works well on machines with a high NUMA factor can work just fine on machines with a low NUMA factor or none at all, Linux being a nice example of this.

                    , and it might look like a shared memory, but it isnt:

                    "I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency."
                    So a workload that gets only 75% efficiency on 24 cores/4S is slow when running on a system with cache coherency provided by a virtualization layer rather than hw and connected with IB is somehow NOT bleeting obvious? Sheesh (woot, we're going in circles!!)

                    Ok, so you claim that there are big SMP Linux servers on the market.
                    No, I'm not. Sorry if I didn't include my by now routine SMP != CC-NUMA point; it's difficult to see whether you've actually gotten that or not since you continue to talk about "big SMP".

                    Fine. Please show me links then. I am not talking about a prototype as Big Tux HP Superdome or experimental POWER7, I am talking about SMP servers for sale, with supported 64 cpus or so. Where are those links?
                    Did you read the paragraph you're answering to? In case you didn't, here an excerpt again: "HW is available (except for SGI, you probably can't get a support contract for running Linux on it, but for kernel development that is of course irrelevant)." Duh.

                    But which Linux devs has access to those big SMP servers that are for sale?
                    Well, again with the caveat that all big machines are CC-NUMA, not SMP, those developers who are working on scalability issues have access to big machines to benchmark their work. I've recently even given a couple of prominent examples, the SGI developers who have worked on scalability issues relevant for their customers, and the VFS scalability work.

                    Where are those SMP servers, by the way? Who sell them?
                    Well, again with the caveat that all big machines are CC-NUMA, not SMP, there's plenty of big machines for sale by IBM, HP, Fujitsu, SGI, and probably others as well.

                    You and everyone else
                    Obviously, I'm not going to give a blanket agreement with something "everyone else" says or may say in the future. If I feel a need to publicly agree with someone else, I'll explicitly says so. Don't assume I agree with something I haven't written.

                    , claim: that Linux does SMP as easily as HPC.
                    What the heck is this statement even supposed to mean? If one takes the usual definitions for SMP and HPC, it seems like an apples to oranges comparison.

                    And the HPC Altix servers, are in fact SMP servers.
                    No, they are CC-NUMA machines. I've never claimed that they are "SMP servers".

                    Does you claim hold water?
                    My claims? Yes. Some strawman claim you came up with all by yourself? Obviously not, since the claims were designed by yourself to be false.

                    There is something called Occams razor. What does it say?
                    It says that you're a proper frothing-at-the-mouth Solaris fanboy, desperately grasping at any straws you can find in order to make Solaris look good. And since you don't have a solid grasp of the issues you're talking about, much lulz ensues.

                    Comment


                    • #50
                      Originally posted by jabl View Post
                      Of course the worst case remote memory latency on a big CC-NUMA will be much higher than for local memory. That's the entire point. The alternative, after all, is not SMP but rather USMA (uniformly slow memory access). And yes, in order to run efficiently on a big CC-NUMA machine you need an OS kernel that has been designed with this in mind (such as Linux) as well as applications that are very careful wrt non-local memory accesses. The latter point ruling out most applications including RDBMS'es which is why you don't see 4096-way DB servers anywhere.
                      ...
                      No, they [SGI Altix] are CC-NUMA machines. I've never claimed that they are "SMP servers".
                      Great. It seems that we are settled:

                      Those Itanium cpu SGI Altix servers are not SMP systems. Instead, they are ccNUMA systems, i.e. basically a cluster. That is why we will never see them as 4096 core Database servers. That is the reason they are not doing SMP work.

                      One can view NUMA as a very tightly coupled form of cluster computing
                      I have seen numbers of >50x slower RAM latency with NUMA systems.

                      When you take a cluster, where some memory cells are very far away from a node, and pretend them to be one single image via NUMA - you get very slow RAM latency in worst case. In some case, RAM latency is good (if the memory cell happens to be close to the cpu) and in other cases the RAM latency is very bad (if the memory cell is in another PC on the network). This can wreak havoc with performance.

                      On the other hand, the Oracle M9000 Solaris server with 64cpus, have a latency of approximately 1.3x which is very good. Thus, NUMA is not needed on these M9000 SMP servers. You dont need to mask memory cells far away, via NUMA. Uniform memory servers does not have that 50x worse latency, as clusters has, though Uniform memory servers probably have some worst case latency scenarios too.
                      BLOG UPDATE ( 19-JUN-2009): I need to point out that ML 759565.1 has been significantly revised. The message regarding test before enabling NUMA persists. Not that it matters much,  I concur with t…


                      He also claims that quite recently, in 2007, Linux had a very bad NUMA API support. NUMA was hardly implemented at all in Linux:
                      This blog entry is part five in a series. Please visit here for links to the previous installments. Opteron-Based Servers are NUMA Systems Or are they? It depends on how you boot them. For instance…

                      "The NUMA API for Linux is very rudimentary compared to the boutique features in legacy NUMA systems like Sequent DYNIX/ptx and SGI IRIX, but it does support memory and process placement".

                      But SGI sold their their Altix systems in 2003. How bad was the NUMA API back then? NUMA could not have worked well for Altix Itanium back then. Only for some work loads that could use the beta phase NUMA API, the Altix cluster would have worked.


                      I do wonder what would happen if we compiled Linux on an Oracle M9000. How good would Linux perform? Apparently it is difficult to build these MPNI servers:

                      With Advanced Micro Devices not building any chipsets that go beyond four Opteron processor sockets in a single system image ? and no one else interested in doing chipsets, either ? there is an opportunity, it would seem, for someone to make big wonking Opteron boxes to compete against RISC and Itanium machines.
                      Many have tried. Newisys, Liquid Computing, Fabric7, 3Leaf Systems, and NUMAscale all took very serious runs at it, and thus far, four out of five of them have gone the way of all flesh. It is not a coincidence that these companies fail
                      There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.

                      Comment

                      Working...
                      X