Announcement

Collapse
No announcement yet.

Linux vs Solaris - scalability, etc

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by kebabbert View Post
    Regarding Solaris, it has had it's network stack completely redesigned in Solaris 10 and S10. The reason the name "Slowlaris" was used earlier, was because old Solaris 8 and 9 had a slow network stack. But the stack has been completely rewritten now, and is extremely fast in S11. In early benchmarks S10 with an early version of the new network stack, Solaris had 20-30% higher performance than RedHat Linux - on the same hardware.

    Here is a stock exchange that says they got 36% better performance, on the same hardware by switching from Linux to Solaris.


    Of course this is a few years old, and Linux has improved since then.

    But, Solaris has also improved. The step from Solaris 10 to S11 is huge, the improvements are huge.
    Where did you get the same hardware quote from ? I found only results stating that PHLX is making extensive use of Solaris on Sparc. Also this:

    "VERITAS software has already helped The Philadelphia Stock Exchange increase storage utilization by more than 25 percent," said Bill Morgan, CIO, Philadelphia Stock Exchange. "The new features in VERITAS Storage Foundation 4.1 will help us drive further cost efficiencies and simplify the management of our tiered storage model. Furthermore, because VERITAS software supports Solaris 10, we have seen tremendous operating system performance gains that will allow us to better serve our customers."

    http://linux.sys-con.com/node/48771

    So where is your advantage from switching OSes on the same HW now ?

    "PHLX is scheduled to go live in early July 2005 with its options trading platform running on Solaris 10 OS. Adds Ward: "The Philadelphia Stock Exchange has been so impressed with Solaris 10 on SPARC processors that it has now embarked on a new proof-of-concept trial on the Sun FireV40z server for x64-based systems and Solaris 10."

    http://www.finextra.com/news/fullsto...wsitemid=13860

    Also it looks like PHLX was a long time Sun customer at least since Solaris 7: http://www.allbusiness.com/economy-e...6914381-1.html

    So when exactly did they make the switch ?

    Final nail in your coffin:

    http://blogs.oracle.com/sunay/entry/solaris_vs_red_hat

    "Bill Morgan, CIO at Philadelphia Stock Exchange Inc.", where he said that Solaris 10 improved his trading capacity by 36%. Now we are not talking about a micro benchmark here but a system level capacity. This was on a 12 way E4800 (SPARC platform). Basically, they loaded Solaris 10 on the same H/W and were able to do 36% more stock transactions per second.

    I guess if somebody made a better OS for Sparcs than SUN, it would be a great surprise.

    Comment


    • #52
      Originally posted by kebabbert View Post
      Initial post
      OK I finally checked the facts on the initial post (lazy, I know).

      1. Your SAP benchmarks quoted are BOTH on NUMA systems. NOT SMP. Any modern Opteron system is NUMA.
      2. Are you really comparing Oracle 10g to MaxDB for the RDBMS backend ???
      3. User count is different
      4. CPU utilisation is different (this really points to the DB/storage backend).

      Also there are no details about the DB configuration and storage backend used in each case (does the DB run completely in memory???).

      Anyway so far all you pointed to are either your misunderstandings or marketing quotes. The only real parts on your side are quotes from Linux developers that do acknowledge the shortcomings and they are working on them.

      Comment


      • #53
        Final nail in your coffin:
        The only nail in the coffin he needed was to point out that the article, which is mostly a advertisement for Sun, is 6 years out of date. The world has moved on.

        I guess if somebody made a better OS for Sparcs than SUN, it would be a great surprise.
        Actually that is were the term 'Slowaris' came from. It's not from the network stack per say, is that when you take old Sparc systems and replaced the OS with Linux they ran much faster.

        Comment


        • #54
          There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.
          that's funny you are bragging about this, but the reality is that IBM's POWER platform stomps the crap out of any equivalently priced SPARC system. Which is one of the major reasons why Sun was not able to keep up with other companies on the high end. Which is one of the major reasons they are now called 'Oracle'.

          Solaris is not that bad for a lot of things and has good advantages over Linux in some things, but crowing Solaris vs Linux about the 'high end' is missing the point. The real high end hasn't come from Sun for a while now. Sure you can spend a lot of Sparc hardware, but that is really all you are accomplishing. People are willing to spend a lot because it's cheaper to drop a million dollars on some big database server then it is to migrate away from it to something that will offer much better performance at a fraction of the price.

          The reason more companies don't get involved in these sorts of things is because that means competing with IBM. That is difficult. IBM is a monster and offers 'total solutions' were hardware is just one small factor in a lot of things. It's not even really that expensive anymore compared to what it used to cost to get stuff done.

          Comment


          • #55
            Originally posted by kebabbert View Post
            Great. It seems that we are settled:

            Those Itanium cpu SGI Altix servers are not SMP systems. Instead, they are ccNUMA systems,
            Yes; I agree. As a minor point, the current x86-based Altix UV systems are also CC-NUMA; except for the change in processors and an upgraded interconnect they are basically the same as the old Itanium Altix systems.

            i.e. basically a cluster.
            Well, for some fairly stretched definition of a cluster, I suppose? If you so fervently want to argue this, could you start by providing a definition for a cluster?

            That is why we will never see them as 4096 core Database servers.
            A more relevant restriction, probably, is that you won't find RDBMS software capable of scaling to those core counts nor useful RDBMS workloads to take advantage of it. Doesn't really matter if the OS and HW scales to a million cores (hypothetically speaking), if the application software doesn't.

            From that wikipedia article: "The addition of virtual memory paging to a cluster architecture can allow the implementation of NUMA entirely in software where no NUMA hardware exists. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater than that of hardware-based NUMA.". You might want to think about that next time you're about to post your ScaleMP quote as some kind of proof that Linux and/or Altix "doesn't scale" (or whatever your point with that quote is, I'm not really sure).

            I have seen numbers of >50x slower RAM latency with NUMA systems.
            FWIW, SGI claims that for the largest supported Altix UV configuration local latency is 75 ns, worst case remote latency 1 us, giving a NUMA factor of about 13.

            When you take a cluster, where some memory cells are very far away from a node, and pretend them to be one single image via NUMA - you get very slow RAM latency in worst case. In some case, RAM latency is good (if the memory cell happens to be close to the cpu) and in other cases the RAM latency is very bad (if the memory cell is in another PC on the network). This can wreak havoc with performance.
            Thanks for showing that you understand the basics of NUMA.

            The implication of this is that you need an OS as well as userspace software that takes NUMA into account. The bigger the NUMA factor is, the worse is the penalty for not getting this right.

            On the other hand, the Oracle M9000 Solaris server with 64cpus, have a latency of approximately 1.3x which is very good. Thus, NUMA is not needed on these M9000 SMP servers.
            Indeed, a factor of 1.3 is quite good. Suspiciously good in fact, considering that's about the factor you have on a modern 2 socket x86 system. So lets look into it: http://www.oracle.com/technetwork/ar...ons-163845.pdf

            So on page 14 we have information about the memory latency. On a 64 socket M9000, you have 532 ns for accessing the memory furthest away from the core, which is, I suppose reasonable for a system of that size. For comparison, the worst case latency on a 256 socket Altix UV is about twice that (see above), which again is quite reasonable since there's bound to be a few more router hops with that many sockets (and different network topology). But look at the local memory latency: 437 ns! Ouch. Simply, ouch. Again, for comparison, on the Altix UV local memory latency is 75 ns, which is a relatively small penalty compared to a simple 2S x86 machine without any directory overhead and, obviously, a pretty small snoop broadcast domain. So we see that the M9000 manages to have a relatively small NUMA factor not by having some super-awesome technology making remote memory access fast, but mostly by having terrible local memory latency. Not exactly something to brag about, eh.

            He also claims that quite recently, in 2007, Linux had a very bad NUMA API support. NUMA was hardly implemented at all in Linux:
            This blog entry is part five in a series. Please visit here for links to the previous installments. Opteron-Based Servers are NUMA Systems Or are they? It depends on how you boot them. For instance…

            "The NUMA API for Linux is very rudimentary compared to the boutique features in legacy NUMA systems like Sequent DYNIX/ptx and SGI IRIX, but it does support memory and process placement".

            But SGI sold their their Altix systems in 2003. How bad was the NUMA API back then? NUMA could not have worked well for Altix Itanium back then. Only for some work loads that could use the beta phase NUMA API, the Altix cluster would have worked.
            Apparently it worked well enough for SGI to phase out Irix in favor of Linux.

            There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.
            If Linux lack of technical capability is the only thing that is lacking, as you claim, why don't we have a big bunch of companies using opensolaris (or illumos or whatever it's called this week) and competing successfully with the likes of IBM, HP, and Oracle in the high-end RDBMS server market? Don't they want to be billionares, to use your own argument?

            Comment


            • #56
              Originally posted by drag View Post
              Actually that is were the term 'Slowaris' came from. It's not from the network stack per say, is that when you take old Sparc systems and replaced the OS with Linux they ran much faster.
              Not neccesarily just Linux: I remember when Solaris first came out it was considered a significant downgrade from SunOS, both in performance and stability.

              Comment

              Working...
              X