Announcement

Collapse
No announcement yet.

Linux vs Solaris - scalability, etc

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by kebabbert View Post
    Ok, this interesting. You claim that Linux has bad I/O just as someone here, earlier claimed? Maybe that could be the reason?

    Here is a storage expert that also says that Linux has bad I/O and does not cut it
    http://www.enterprisestorageforum.co...le.php/3745996
    For this, he was flamed. Of course.

    Here is the follow up article, answering to the Linux flame:
    http://www.enterprisestorageforum.co...le.php/3749926


    Here is another link, he tries to see if Linux can handle big filesystems, but no Linux vendor wants him to do this benchamrk. Why? Do the vendors believe that Linux has bad I/O?
    http://www.enterprisestorageforum.co...uccessful.html


    And that is the reason there are no big SMP Linux servers on the market? Because of bad I/O scaling? Does any one have more info on this?
    Learn to read and actualy understand what you read.

    I wrote about my experience with HP-UX !!! I so have similar experience with Linux systems, but I simply generalised that no OS scales well on IO limited workloads since the limits are EXTERNAL TO THE OS !!!!

    It seems that any argument with you is irelevant, since you don't want to even understand it, not to say infer the correct conclusions.

    QED

    Comment


    • #47
      Originally posted by haplo602 View Post
      ...but I simply generalised that no OS scales well on IO limited workloads since the limits are EXTERNAL TO THE OS !!!!

      It seems that any argument with you is irelevant, since you don't want to even understand it, not to say infer the correct conclusions.

      QED
      I do want to understand you, but actually, I think you could be a bit clearer. How is a limit "internal" or "external" to the OS?

      Is I/O external to the OS? What do you mean? Well, the I/O consists of disks and controller cards, which are external components. The OS just use the I/O, yes. Just as the OS are using the cpu, and use RAM.

      In other words, in a similar reasoning, would you also claim that
      "no OS scales well on cpu limited workloads since the limits are EXTERNAL TO THE OS!!!" ?
      Or, talk about RAM as an external component? I dont get it.




      Regarding all OSes has I/O problems. Well, as I shown links from large scale storage experts, it seems that Linux has I/O problems. He says that he got flamed for this, but then he asked the Linux flamers:
      If you disagree, try it yourself. Go mkfs a 500 TB ext-3/4 or other Linux file system, fill it up with multiple streams of data, add/remove files for a few months with, say, 20 GB/sec of bandwidth from a single large SMP server and crash the system and fsck it and tell me how long it takes. Does the I/O performance stay consistent during that few months of adding and removing files? Does the file system perform well with 1 million files in a single directory and 100 million files in the file system?
      The Linux users are most probably used to 20-30TB filesystems and believes that Linux also behaves well on large scale. Which is apparantely does not. Just because Linux behaves fast on a few cpus and few disks, does not mean it scales well up to 100 cpus and 500 disks.

      I dont know about HP-UX but it seems that development of HP-UX is not really prioritised. I dont see much innovation from HP-UX of lately. Earlier, it was innovative. But today? Not much. I dont know how good HP-UX is at I/O?

      Regarding Solaris, it has had it's network stack completely redesigned in Solaris 10 and S10. The reason the name "Slowlaris" was used earlier, was because old Solaris 8 and 9 had a slow network stack. But the stack has been completely rewritten now, and is extremely fast in S11. In early benchmarks S10 with an early version of the new network stack, Solaris had 20-30% higher performance than RedHat Linux - on the same hardware.

      Here is a stock exchange that says they got 36% better performance, on the same hardware by switching from Linux to Solaris.
      http://www.computerworld.com/s/artic...taxonomyId=012
      Bill Morgan, CIO at Philadelphia Stock Exchange Inc., has a new electronic options-trading system that runs on Solaris 10, which he said has improved trading capacity by 36%. Like Greenwade, he credited performance improvements to the TCP/IP stack, as well as improved multithreading support and a new feature called DTrace (for Dynamic Tracing) that tunes application performance.
      Of course this is a few years old, and Linux has improved since then.

      But, Solaris has also improved. The step from Solaris 10 to S11 is huge, the improvements are huge.

      Comment


      • #48
        Originally posted by jabl View Post
        XFS is about as safe as other mainstream Linux filesystems, with the exception of btrfs. Thus your point is irrelevant, since the topic is scalability
        Not only scalability. "Linux vs Solaris - scalability, etc"



        Yes, ZFS/BTRFS are (or will be, once they are deemed production-quality) awesome if you want to provide reliable storage using cheap SATA drives. High end storage arrays often provide scrubbing and checksumming at the array level, so you can certainly get that kind of safety without baking it into the filesystem. Anyway, that's a totally different discussion so I won't go further into that here.
        Well, CERN says that even some highend storage systems are not designed to catch data corruption. I am not convinced all highend storage systems are safe, if we are going to trust CERN.



        That point is also completely wrong, because those developers that are working on some aspect of scalability certainly do have access to big machines
        Well, according to Linux devs, for instance, Ted Tso, you are wrong. The Linux devs do not have access to big SMP systems. Maybe because they dont exist. HP sells a Superdome Itanium 8 cpu Big Tux server, but other than that, there are no big SMP Linux on the market.


        (and, like I've said many times before, big machines are nowadays invariably NUMA, not SMP!!).
        And, as I have said, the big Linux 4.096 core HPC systems have terrible latency, and it might look like a shared memory, but it isnt:

        "I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency."



        Then you should have said so in your claim.
        When I discuss scalability, I do not talk about 2-4 cpus, just as Linux users are used to. I talk about dozens of cpus, 64 cpus, or even 144 cpus. I thought it was clear from the "scalability" context and "big SMP servers" that we were discussing big servers.




        Which, as I said above, is wrong. HW is available (except for SGI, you probably can't get a support contract for running Linux on it, but for kernel development that is of course irrelevant). For instance, SGI developers working on aspects of scalability that affect their customers certainly have access to their own HW (duh..). The recent VFS scalability work was tested on some big (at least 128-way) POWER7 machine. And so on.
        Ok, so you claim that there are big SMP Linux servers on the market. Fine. Please show me links then. I am not talking about a prototype as Big Tux HP Superdome or experimental POWER7, I am talking about SMP servers for sale, with supported 64 cpus or so. Where are those links?



        Most != all, which is the entire point. Generally, only a small set of core developers that are developing the core data structures and algorithms need to worry about scalability (and they certainly do care!). J. Random Developer who is developing a driver for an USB connected coffee brewer has no reason to give a shit about scalability.
        Agreed that not every developer is doing scalability.

        But which Linux devs has access to those big SMP servers that are for sale? Where are those SMP servers, by the way? Who sell them?



        Because SGI is a company which is focused on other markets, and changing that focus would be a very disruptive and risky undertaking. Similar to why Ford is not switching to cocaine production.
        Again. Your example can be improved.

        You and everyone else, claim: that Linux does SMP as easily as HPC. And the HPC Altix servers, are in fact SMP servers. Fine. Does you claim hold water?

        If your claim is true, why dont SGI just sell them as SMP servers? SGI does not need to change anything, no recompilation of Linux, no hardware change, no nothing, just sell them as SMP servers.

        Regarding your Ford example, switching to drugs because drugs are much more higher margin and lucrative. Well, Ford does not know how to do drugs. It would be a big investment and building new factories, etc. But in the case Linux, nothing needs to be changed at all, because according to you: there are big SMP servers on the market: SGI.

        A better Ford example would be: why dont Ford sell cars painted as the USA flag? Stripes and stars? It is easy for Ford to do. And it also turns out that USA flag cars, is a billion USD dollar market, and those companies earns gazillions of USD with 100.000s of employees. Why dont Ford sell USA flag painted cars? Apparently, Ford knows how to do cars, and to paint them in other colors is not difficult.

        SMP and HPC is the same thing, according to you, and to sell SMP or HPC does not matter, it is the same server.




        Other similar stuff
        My point is, if SMP and HPC is the same thing, then Linux companies will snatch the SMP market where the big bucks are, and earn millions and become the next IBM or Oracle or Apple or Google or... and become billionaries. But no, no Linux company does this. Why? One IBM server with 32 cpus, costed 35 million USD list price. Why dont SGI Altix sell their 4096core server with 50x performance for a fraction of the price?

        Is it because there is a big difference in SMP and HPC, you can not sell a HPC system and say it is SMP? HPC can not do SMP work?
        Or is it because Linux has I/O problems doing SMP work?
        Or is it because Linux companies dont want to earn billions of USD, as that market is "not lucrative"?

        I have heard lots of different explanations. There is something called Occams razor. What does it say?

        Comment


        • #49
          Originally posted by kebabbert View Post
          I do want to understand you, but actually, I think you could be a bit clearer. How is a limit "internal" or "external" to the OS?

          Is I/O external to the OS? What do you mean? Well, the I/O consists of disks and controller cards, which are external components. The OS just use the I/O, yes. Just as the OS are using the cpu, and use RAM.

          In other words, in a similar reasoning, would you also claim that
          "no OS scales well on cpu limited workloads since the limits are EXTERNAL TO THE OS!!!" ?
          Or, talk about RAM as an external component? I dont get it.
          It seems you have no clue about basic concepts.

          External means that the OS has no influence over the bottleneck. I.e if the OS were 100% efficient at the given task/workload it would still show up as slow since the real bottleneck is not OS efficiency. DO YOU GET THE POINT ???

          In case of IO operations, the only reasonable test for scaling is in-memory filesystems. Otherwise you are always dealing with external elements that you have to account for (controller, interconnect, disk backend, firmware bugs at the various stages of the chain etc.). Yes some OSes are more efficient than others, but the real differences will be a few % at best. Otherwise the implementation is just bad.

          Regarding HP-UX: I have no clue. Inovation is not a priority in enterprise. Stability and performance is. Oh and accountability. If you cannot track issues properly, you have a big problem. That's the one major point Linux lacks. Everything else is mostly there.

          btw your initial point was scaling in general and now you are jumping between IO/fs/network ... what's the point of discussion again ?

          Comment


          • #50
            Originally posted by haplo602 View Post
            It seems you have no clue about basic concepts.

            External means that the OS has no influence over the bottleneck. I.e if the OS were 100% efficient at the given task/workload it would still show up as slow since the real bottleneck is not OS efficiency. DO YOU GET THE POINT ???
            No, actually I still dont understand.

            Let us instead apply your claim on "CPUs" instead. I mean, the OS has no direct influence over the cpu. If the CPU needs to do work for 3 persons, the OS does not choose to idle instead. The OS will use the CPU as good as the OS can do. It does not matter if the cpu is slow or fast, the OS will use it as efficient as possible. Same with I/O.

            This gives: "If the OS were 100% efficient at the given CPU workload it would still show up as slow since the real bottleneck is not OS efficiency"

            Could you try to explain this a bit more? I still dont get it?

            Comment


            • #51
              Originally posted by kebabbert View Post
              Well, CERN says that even some highend storage systems are not designed to catch data corruption.
              I believe that, but in no way does that refute what I said, since I didn't claim that ALL highend storage systems have features to catch data corruption. Merely that they OFTEN have such features, which is factually true.

              Well, according to Linux devs, for instance, Ted Tso, you are wrong.
              No, you're just quote-mining stuff and taking statements well beyond their original context. You really think SGI developers don't have access to their own hardware, for instance?

              And, as I have said, the big Linux 4.096 core HPC systems have terrible latency
              Of course the worst case remote memory latency on a big CC-NUMA will be much higher than for local memory. That's the entire point. The alternative, after all, is not SMP but rather USMA (uniformly slow memory access). And yes, in order to run efficiently on a big CC-NUMA machine you need an OS kernel that has been designed with this in mind (such as Linux) as well as applications that are very careful wrt non-local memory accesses. The latter point ruling out most applications including RDBMS'es which is why you don't see 4096-way DB servers anywhere.

              Of course, an OS that works well on machines with a high NUMA factor can work just fine on machines with a low NUMA factor or none at all, Linux being a nice example of this.

              , and it might look like a shared memory, but it isnt:

              "I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency."
              So a workload that gets only 75% efficiency on 24 cores/4S is slow when running on a system with cache coherency provided by a virtualization layer rather than hw and connected with IB is somehow NOT bleeting obvious? Sheesh (woot, we're going in circles!!)

              Ok, so you claim that there are big SMP Linux servers on the market.
              No, I'm not. Sorry if I didn't include my by now routine SMP != CC-NUMA point; it's difficult to see whether you've actually gotten that or not since you continue to talk about "big SMP".

              Fine. Please show me links then. I am not talking about a prototype as Big Tux HP Superdome or experimental POWER7, I am talking about SMP servers for sale, with supported 64 cpus or so. Where are those links?
              Did you read the paragraph you're answering to? In case you didn't, here an excerpt again: "HW is available (except for SGI, you probably can't get a support contract for running Linux on it, but for kernel development that is of course irrelevant)." Duh.

              But which Linux devs has access to those big SMP servers that are for sale?
              Well, again with the caveat that all big machines are CC-NUMA, not SMP, those developers who are working on scalability issues have access to big machines to benchmark their work. I've recently even given a couple of prominent examples, the SGI developers who have worked on scalability issues relevant for their customers, and the VFS scalability work.

              Where are those SMP servers, by the way? Who sell them?
              Well, again with the caveat that all big machines are CC-NUMA, not SMP, there's plenty of big machines for sale by IBM, HP, Fujitsu, SGI, and probably others as well.

              You and everyone else
              Obviously, I'm not going to give a blanket agreement with something "everyone else" says or may say in the future. If I feel a need to publicly agree with someone else, I'll explicitly says so. Don't assume I agree with something I haven't written.

              , claim: that Linux does SMP as easily as HPC.
              What the heck is this statement even supposed to mean? If one takes the usual definitions for SMP and HPC, it seems like an apples to oranges comparison.

              And the HPC Altix servers, are in fact SMP servers.
              No, they are CC-NUMA machines. I've never claimed that they are "SMP servers".

              Does you claim hold water?
              My claims? Yes. Some strawman claim you came up with all by yourself? Obviously not, since the claims were designed by yourself to be false.

              There is something called Occams razor. What does it say?
              It says that you're a proper frothing-at-the-mouth Solaris fanboy, desperately grasping at any straws you can find in order to make Solaris look good. And since you don't have a solid grasp of the issues you're talking about, much lulz ensues.

              Comment


              • #52
                Originally posted by jabl View Post
                Of course the worst case remote memory latency on a big CC-NUMA will be much higher than for local memory. That's the entire point. The alternative, after all, is not SMP but rather USMA (uniformly slow memory access). And yes, in order to run efficiently on a big CC-NUMA machine you need an OS kernel that has been designed with this in mind (such as Linux) as well as applications that are very careful wrt non-local memory accesses. The latter point ruling out most applications including RDBMS'es which is why you don't see 4096-way DB servers anywhere.
                ...
                No, they [SGI Altix] are CC-NUMA machines. I've never claimed that they are "SMP servers".
                Great. It seems that we are settled:

                Those Itanium cpu SGI Altix servers are not SMP systems. Instead, they are ccNUMA systems, i.e. basically a cluster. That is why we will never see them as 4096 core Database servers. That is the reason they are not doing SMP work.
                http://en.wikipedia.org/wiki/Non-Uni...ster_computing
                One can view NUMA as a very tightly coupled form of cluster computing
                I have seen numbers of >50x slower RAM latency with NUMA systems.

                When you take a cluster, where some memory cells are very far away from a node, and pretend them to be one single image via NUMA - you get very slow RAM latency in worst case. In some case, RAM latency is good (if the memory cell happens to be close to the cpu) and in other cases the RAM latency is very bad (if the memory cell is in another PC on the network). This can wreak havoc with performance.

                On the other hand, the Oracle M9000 Solaris server with 64cpus, have a latency of approximately 1.3x which is very good. Thus, NUMA is not needed on these M9000 SMP servers. You dont need to mask memory cells far away, via NUMA. Uniform memory servers does not have that 50x worse latency, as clusters has, though Uniform memory servers probably have some worst case latency scenarios too.
                http://kevinclosson.wordpress.com/20...gives-part-ii/

                He also claims that quite recently, in 2007, Linux had a very bad NUMA API support. NUMA was hardly implemented at all in Linux:
                http://kevinclosson.wordpress.com/20...ctl8-and-suma/
                "The NUMA API for Linux is very rudimentary compared to the boutique features in legacy NUMA systems like Sequent DYNIX/ptx and SGI IRIX, but it does support memory and process placement".

                But SGI sold their their Altix systems in 2003. How bad was the NUMA API back then? NUMA could not have worked well for Altix Itanium back then. Only for some work loads that could use the beta phase NUMA API, the Altix cluster would have worked.


                I do wonder what would happen if we compiled Linux on an Oracle M9000. How good would Linux perform? Apparently it is difficult to build these MPNI servers:
                http://www.theregister.co.uk/2011/09..._amd_opterons/
                With Advanced Micro Devices not building any chipsets that go beyond four Opteron processor sockets in a single system image and no one else interested in doing chipsets, either there is an opportunity, it would seem, for someone to make big wonking Opteron boxes to compete against RISC and Itanium machines.
                Many have tried. Newisys, Liquid Computing, Fabric7, 3Leaf Systems, and NUMAscale all took very serious runs at it, and thus far, four out of five of them have gone the way of all flesh. It is not a coincidence that these companies fail
                There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.

                Comment


                • #53
                  Originally posted by kebabbert View Post
                  Regarding Solaris, it has had it's network stack completely redesigned in Solaris 10 and S10. The reason the name "Slowlaris" was used earlier, was because old Solaris 8 and 9 had a slow network stack. But the stack has been completely rewritten now, and is extremely fast in S11. In early benchmarks S10 with an early version of the new network stack, Solaris had 20-30% higher performance than RedHat Linux - on the same hardware.

                  Here is a stock exchange that says they got 36% better performance, on the same hardware by switching from Linux to Solaris.
                  http://www.computerworld.com/s/artic...taxonomyId=012

                  Of course this is a few years old, and Linux has improved since then.

                  But, Solaris has also improved. The step from Solaris 10 to S11 is huge, the improvements are huge.
                  Where did you get the same hardware quote from ? I found only results stating that PHLX is making extensive use of Solaris on Sparc. Also this:

                  "VERITAS software has already helped The Philadelphia Stock Exchange increase storage utilization by more than 25 percent," said Bill Morgan, CIO, Philadelphia Stock Exchange. "The new features in VERITAS Storage Foundation 4.1 will help us drive further cost efficiencies and simplify the management of our tiered storage model. Furthermore, because VERITAS software supports Solaris 10, we have seen tremendous operating system performance gains that will allow us to better serve our customers."

                  http://linux.sys-con.com/node/48771

                  So where is your advantage from switching OSes on the same HW now ?

                  "PHLX is scheduled to go live in early July 2005 with its options trading platform running on Solaris 10 OS. Adds Ward: "The Philadelphia Stock Exchange has been so impressed with Solaris 10 on SPARC processors that it has now embarked on a new proof-of-concept trial on the Sun FireV40z server for x64-based systems and Solaris 10."

                  http://www.finextra.com/news/fullsto...wsitemid=13860

                  Also it looks like PHLX was a long time Sun customer at least since Solaris 7: http://www.allbusiness.com/economy-e...6914381-1.html

                  So when exactly did they make the switch ?

                  Final nail in your coffin:

                  http://blogs.oracle.com/sunay/entry/solaris_vs_red_hat

                  "Bill Morgan, CIO at Philadelphia Stock Exchange Inc.", where he said that Solaris 10 improved his trading capacity by 36%. Now we are not talking about a micro benchmark here but a system level capacity. This was on a 12 way E4800 (SPARC platform). Basically, they loaded Solaris 10 on the same H/W and were able to do 36% more stock transactions per second.

                  I guess if somebody made a better OS for Sparcs than SUN, it would be a great surprise.

                  Comment


                  • #54
                    Originally posted by kebabbert View Post
                    Initial post
                    OK I finally checked the facts on the initial post (lazy, I know).

                    1. Your SAP benchmarks quoted are BOTH on NUMA systems. NOT SMP. Any modern Opteron system is NUMA.
                    2. Are you really comparing Oracle 10g to MaxDB for the RDBMS backend ???
                    3. User count is different
                    4. CPU utilisation is different (this really points to the DB/storage backend).

                    Also there are no details about the DB configuration and storage backend used in each case (does the DB run completely in memory???).

                    Anyway so far all you pointed to are either your misunderstandings or marketing quotes. The only real parts on your side are quotes from Linux developers that do acknowledge the shortcomings and they are working on them.

                    Comment


                    • #55
                      Final nail in your coffin:
                      The only nail in the coffin he needed was to point out that the article, which is mostly a advertisement for Sun, is 6 years out of date. The world has moved on.

                      I guess if somebody made a better OS for Sparcs than SUN, it would be a great surprise.
                      Actually that is were the term 'Slowaris' came from. It's not from the network stack per say, is that when you take old Sparc systems and replaced the OS with Linux they ran much faster.

                      Comment


                      • #56
                        There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.
                        that's funny you are bragging about this, but the reality is that IBM's POWER platform stomps the crap out of any equivalently priced SPARC system. Which is one of the major reasons why Sun was not able to keep up with other companies on the high end. Which is one of the major reasons they are now called 'Oracle'.

                        Solaris is not that bad for a lot of things and has good advantages over Linux in some things, but crowing Solaris vs Linux about the 'high end' is missing the point. The real high end hasn't come from Sun for a while now. Sure you can spend a lot of Sparc hardware, but that is really all you are accomplishing. People are willing to spend a lot because it's cheaper to drop a million dollars on some big database server then it is to migrate away from it to something that will offer much better performance at a fraction of the price.

                        The reason more companies don't get involved in these sorts of things is because that means competing with IBM. That is difficult. IBM is a monster and offers 'total solutions' were hardware is just one small factor in a lot of things. It's not even really that expensive anymore compared to what it used to cost to get stuff done.

                        Comment


                        • #57
                          Originally posted by kebabbert View Post
                          Great. It seems that we are settled:

                          Those Itanium cpu SGI Altix servers are not SMP systems. Instead, they are ccNUMA systems,
                          Yes; I agree. As a minor point, the current x86-based Altix UV systems are also CC-NUMA; except for the change in processors and an upgraded interconnect they are basically the same as the old Itanium Altix systems.

                          i.e. basically a cluster.
                          Well, for some fairly stretched definition of a cluster, I suppose? If you so fervently want to argue this, could you start by providing a definition for a cluster?

                          That is why we will never see them as 4096 core Database servers.
                          A more relevant restriction, probably, is that you won't find RDBMS software capable of scaling to those core counts nor useful RDBMS workloads to take advantage of it. Doesn't really matter if the OS and HW scales to a million cores (hypothetically speaking), if the application software doesn't.

                          From that wikipedia article: "The addition of virtual memory paging to a cluster architecture can allow the implementation of NUMA entirely in software where no NUMA hardware exists. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater than that of hardware-based NUMA.". You might want to think about that next time you're about to post your ScaleMP quote as some kind of proof that Linux and/or Altix "doesn't scale" (or whatever your point with that quote is, I'm not really sure).

                          I have seen numbers of >50x slower RAM latency with NUMA systems.
                          FWIW, SGI claims that for the largest supported Altix UV configuration local latency is 75 ns, worst case remote latency 1 us, giving a NUMA factor of about 13.

                          When you take a cluster, where some memory cells are very far away from a node, and pretend them to be one single image via NUMA - you get very slow RAM latency in worst case. In some case, RAM latency is good (if the memory cell happens to be close to the cpu) and in other cases the RAM latency is very bad (if the memory cell is in another PC on the network). This can wreak havoc with performance.
                          Thanks for showing that you understand the basics of NUMA.

                          The implication of this is that you need an OS as well as userspace software that takes NUMA into account. The bigger the NUMA factor is, the worse is the penalty for not getting this right.

                          On the other hand, the Oracle M9000 Solaris server with 64cpus, have a latency of approximately 1.3x which is very good. Thus, NUMA is not needed on these M9000 SMP servers.
                          Indeed, a factor of 1.3 is quite good. Suspiciously good in fact, considering that's about the factor you have on a modern 2 socket x86 system. So lets look into it: http://www.oracle.com/technetwork/ar...ons-163845.pdf

                          So on page 14 we have information about the memory latency. On a 64 socket M9000, you have 532 ns for accessing the memory furthest away from the core, which is, I suppose reasonable for a system of that size. For comparison, the worst case latency on a 256 socket Altix UV is about twice that (see above), which again is quite reasonable since there's bound to be a few more router hops with that many sockets (and different network topology). But look at the local memory latency: 437 ns! Ouch. Simply, ouch. Again, for comparison, on the Altix UV local memory latency is 75 ns, which is a relatively small penalty compared to a simple 2S x86 machine without any directory overhead and, obviously, a pretty small snoop broadcast domain. So we see that the M9000 manages to have a relatively small NUMA factor not by having some super-awesome technology making remote memory access fast, but mostly by having terrible local memory latency. Not exactly something to brag about, eh.

                          He also claims that quite recently, in 2007, Linux had a very bad NUMA API support. NUMA was hardly implemented at all in Linux:
                          http://kevinclosson.wordpress.com/20...ctl8-and-suma/
                          "The NUMA API for Linux is very rudimentary compared to the boutique features in legacy NUMA systems like Sequent DYNIX/ptx and SGI IRIX, but it does support memory and process placement".

                          But SGI sold their their Altix systems in 2003. How bad was the NUMA API back then? NUMA could not have worked well for Altix Itanium back then. Only for some work loads that could use the beta phase NUMA API, the Altix cluster would have worked.
                          Apparently it worked well enough for SGI to phase out Irix in favor of Linux.

                          There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.
                          If Linux lack of technical capability is the only thing that is lacking, as you claim, why don't we have a big bunch of companies using opensolaris (or illumos or whatever it's called this week) and competing successfully with the likes of IBM, HP, and Oracle in the high-end RDBMS server market? Don't they want to be billionares, to use your own argument?

                          Comment


                          • #58
                            Originally posted by drag View Post
                            Actually that is were the term 'Slowaris' came from. It's not from the network stack per say, is that when you take old Sparc systems and replaced the OS with Linux they ran much faster.
                            Not neccesarily just Linux: I remember when Solaris first came out it was considered a significant downgrade from SunOS, both in performance and stability.

                            Comment

                            Working...
                            X