Page 6 of 6 FirstFirst ... 456
Results 51 to 58 of 58

Thread: Linux vs Solaris - scalability, etc

  1. #51
    Join Date
    Nov 2011
    Posts
    24

    Default

    Quote Originally Posted by kebabbert View Post
    Well, CERN says that even some highend storage systems are not designed to catch data corruption.
    I believe that, but in no way does that refute what I said, since I didn't claim that ALL highend storage systems have features to catch data corruption. Merely that they OFTEN have such features, which is factually true.

    Well, according to Linux devs, for instance, Ted Tso, you are wrong.
    No, you're just quote-mining stuff and taking statements well beyond their original context. You really think SGI developers don't have access to their own hardware, for instance?

    And, as I have said, the big Linux 4.096 core HPC systems have terrible latency
    Of course the worst case remote memory latency on a big CC-NUMA will be much higher than for local memory. That's the entire point. The alternative, after all, is not SMP but rather USMA (uniformly slow memory access). And yes, in order to run efficiently on a big CC-NUMA machine you need an OS kernel that has been designed with this in mind (such as Linux) as well as applications that are very careful wrt non-local memory accesses. The latter point ruling out most applications including RDBMS'es which is why you don't see 4096-way DB servers anywhere.

    Of course, an OS that works well on machines with a high NUMA factor can work just fine on machines with a low NUMA factor or none at all, Linux being a nice example of this.

    , and it might look like a shared memory, but it isnt:

    "I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency."
    So a workload that gets only 75% efficiency on 24 cores/4S is slow when running on a system with cache coherency provided by a virtualization layer rather than hw and connected with IB is somehow NOT bleeting obvious? Sheesh (woot, we're going in circles!!)

    Ok, so you claim that there are big SMP Linux servers on the market.
    No, I'm not. Sorry if I didn't include my by now routine SMP != CC-NUMA point; it's difficult to see whether you've actually gotten that or not since you continue to talk about "big SMP".

    Fine. Please show me links then. I am not talking about a prototype as Big Tux HP Superdome or experimental POWER7, I am talking about SMP servers for sale, with supported 64 cpus or so. Where are those links?
    Did you read the paragraph you're answering to? In case you didn't, here an excerpt again: "HW is available (except for SGI, you probably can't get a support contract for running Linux on it, but for kernel development that is of course irrelevant)." Duh.

    But which Linux devs has access to those big SMP servers that are for sale?
    Well, again with the caveat that all big machines are CC-NUMA, not SMP, those developers who are working on scalability issues have access to big machines to benchmark their work. I've recently even given a couple of prominent examples, the SGI developers who have worked on scalability issues relevant for their customers, and the VFS scalability work.

    Where are those SMP servers, by the way? Who sell them?
    Well, again with the caveat that all big machines are CC-NUMA, not SMP, there's plenty of big machines for sale by IBM, HP, Fujitsu, SGI, and probably others as well.

    You and everyone else
    Obviously, I'm not going to give a blanket agreement with something "everyone else" says or may say in the future. If I feel a need to publicly agree with someone else, I'll explicitly says so. Don't assume I agree with something I haven't written.

    , claim: that Linux does SMP as easily as HPC.
    What the heck is this statement even supposed to mean? If one takes the usual definitions for SMP and HPC, it seems like an apples to oranges comparison.

    And the HPC Altix servers, are in fact SMP servers.
    No, they are CC-NUMA machines. I've never claimed that they are "SMP servers".

    Does you claim hold water?
    My claims? Yes. Some strawman claim you came up with all by yourself? Obviously not, since the claims were designed by yourself to be false.

    There is something called Occams razor. What does it say?
    It says that you're a proper frothing-at-the-mouth Solaris fanboy, desperately grasping at any straws you can find in order to make Solaris look good. And since you don't have a solid grasp of the issues you're talking about, much lulz ensues.

  2. #52
    Join Date
    Nov 2008
    Posts
    418

    Default

    Quote Originally Posted by jabl View Post
    Of course the worst case remote memory latency on a big CC-NUMA will be much higher than for local memory. That's the entire point. The alternative, after all, is not SMP but rather USMA (uniformly slow memory access). And yes, in order to run efficiently on a big CC-NUMA machine you need an OS kernel that has been designed with this in mind (such as Linux) as well as applications that are very careful wrt non-local memory accesses. The latter point ruling out most applications including RDBMS'es which is why you don't see 4096-way DB servers anywhere.
    ...
    No, they [SGI Altix] are CC-NUMA machines. I've never claimed that they are "SMP servers".
    Great. It seems that we are settled:

    Those Itanium cpu SGI Altix servers are not SMP systems. Instead, they are ccNUMA systems, i.e. basically a cluster. That is why we will never see them as 4096 core Database servers. That is the reason they are not doing SMP work.
    http://en.wikipedia.org/wiki/Non-Uni...ster_computing
    One can view NUMA as a very tightly coupled form of cluster computing
    I have seen numbers of >50x slower RAM latency with NUMA systems.

    When you take a cluster, where some memory cells are very far away from a node, and pretend them to be one single image via NUMA - you get very slow RAM latency in worst case. In some case, RAM latency is good (if the memory cell happens to be close to the cpu) and in other cases the RAM latency is very bad (if the memory cell is in another PC on the network). This can wreak havoc with performance.

    On the other hand, the Oracle M9000 Solaris server with 64cpus, have a latency of approximately 1.3x which is very good. Thus, NUMA is not needed on these M9000 SMP servers. You dont need to mask memory cells far away, via NUMA. Uniform memory servers does not have that 50x worse latency, as clusters has, though Uniform memory servers probably have some worst case latency scenarios too.
    http://kevinclosson.wordpress.com/20...gives-part-ii/

    He also claims that quite recently, in 2007, Linux had a very bad NUMA API support. NUMA was hardly implemented at all in Linux:
    http://kevinclosson.wordpress.com/20...ctl8-and-suma/
    "The NUMA API for Linux is very rudimentary compared to the boutique features in legacy NUMA systems like Sequent DYNIX/ptx and SGI IRIX, but it does support memory and process placement".

    But SGI sold their their Altix systems in 2003. How bad was the NUMA API back then? NUMA could not have worked well for Altix Itanium back then. Only for some work loads that could use the beta phase NUMA API, the Altix cluster would have worked.


    I do wonder what would happen if we compiled Linux on an Oracle M9000. How good would Linux perform? Apparently it is difficult to build these MPNI servers:
    http://www.theregister.co.uk/2011/09..._amd_opterons/
    With Advanced Micro Devices not building any chipsets that go beyond four Opteron processor sockets in a single system image and no one else interested in doing chipsets, either there is an opportunity, it would seem, for someone to make big wonking Opteron boxes to compete against RISC and Itanium machines.
    Many have tried. Newisys, Liquid Computing, Fabric7, 3Leaf Systems, and NUMAscale all took very serious runs at it, and thus far, four out of five of them have gone the way of all flesh. It is not a coincidence that these companies fail
    There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.

  3. #53
    Join Date
    Aug 2009
    Posts
    93

    Default

    Quote Originally Posted by kebabbert View Post
    Regarding Solaris, it has had it's network stack completely redesigned in Solaris 10 and S10. The reason the name "Slowlaris" was used earlier, was because old Solaris 8 and 9 had a slow network stack. But the stack has been completely rewritten now, and is extremely fast in S11. In early benchmarks S10 with an early version of the new network stack, Solaris had 20-30% higher performance than RedHat Linux - on the same hardware.

    Here is a stock exchange that says they got 36% better performance, on the same hardware by switching from Linux to Solaris.
    http://www.computerworld.com/s/artic...taxonomyId=012

    Of course this is a few years old, and Linux has improved since then.

    But, Solaris has also improved. The step from Solaris 10 to S11 is huge, the improvements are huge.
    Where did you get the same hardware quote from ? I found only results stating that PHLX is making extensive use of Solaris on Sparc. Also this:

    "VERITAS software has already helped The Philadelphia Stock Exchange increase storage utilization by more than 25 percent," said Bill Morgan, CIO, Philadelphia Stock Exchange. "The new features in VERITAS Storage Foundation 4.1 will help us drive further cost efficiencies and simplify the management of our tiered storage model. Furthermore, because VERITAS software supports Solaris 10, we have seen tremendous operating system performance gains that will allow us to better serve our customers."

    http://linux.sys-con.com/node/48771

    So where is your advantage from switching OSes on the same HW now ?

    "PHLX is scheduled to go live in early July 2005 with its options trading platform running on Solaris 10 OS. Adds Ward: "The Philadelphia Stock Exchange has been so impressed with Solaris 10 on SPARC processors that it has now embarked on a new proof-of-concept trial on the Sun FireV40z server for x64-based systems and Solaris 10."

    http://www.finextra.com/news/fullsto...wsitemid=13860

    Also it looks like PHLX was a long time Sun customer at least since Solaris 7: http://www.allbusiness.com/economy-e...6914381-1.html

    So when exactly did they make the switch ?

    Final nail in your coffin:

    http://blogs.oracle.com/sunay/entry/solaris_vs_red_hat

    "Bill Morgan, CIO at Philadelphia Stock Exchange Inc.", where he said that Solaris 10 improved his trading capacity by 36%. Now we are not talking about a micro benchmark here but a system level capacity. This was on a 12 way E4800 (SPARC platform). Basically, they loaded Solaris 10 on the same H/W and were able to do 36% more stock transactions per second.

    I guess if somebody made a better OS for Sparcs than SUN, it would be a great surprise.

  4. #54
    Join Date
    Aug 2009
    Posts
    93

    Default

    Quote Originally Posted by kebabbert View Post
    Initial post
    OK I finally checked the facts on the initial post (lazy, I know).

    1. Your SAP benchmarks quoted are BOTH on NUMA systems. NOT SMP. Any modern Opteron system is NUMA.
    2. Are you really comparing Oracle 10g to MaxDB for the RDBMS backend ???
    3. User count is different
    4. CPU utilisation is different (this really points to the DB/storage backend).

    Also there are no details about the DB configuration and storage backend used in each case (does the DB run completely in memory???).

    Anyway so far all you pointed to are either your misunderstandings or marketing quotes. The only real parts on your side are quotes from Linux developers that do acknowledge the shortcomings and they are working on them.

  5. #55
    Join Date
    Sep 2006
    Posts
    714

    Default

    Final nail in your coffin:
    The only nail in the coffin he needed was to point out that the article, which is mostly a advertisement for Sun, is 6 years out of date. The world has moved on.

    I guess if somebody made a better OS for Sparcs than SUN, it would be a great surprise.
    Actually that is were the term 'Slowaris' came from. It's not from the network stack per say, is that when you take old Sparc systems and replaced the OS with Linux they ran much faster.

  6. #56
    Join Date
    Sep 2006
    Posts
    714

    Default

    There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.
    that's funny you are bragging about this, but the reality is that IBM's POWER platform stomps the crap out of any equivalently priced SPARC system. Which is one of the major reasons why Sun was not able to keep up with other companies on the high end. Which is one of the major reasons they are now called 'Oracle'.

    Solaris is not that bad for a lot of things and has good advantages over Linux in some things, but crowing Solaris vs Linux about the 'high end' is missing the point. The real high end hasn't come from Sun for a while now. Sure you can spend a lot of Sparc hardware, but that is really all you are accomplishing. People are willing to spend a lot because it's cheaper to drop a million dollars on some big database server then it is to migrate away from it to something that will offer much better performance at a fraction of the price.

    The reason more companies don't get involved in these sorts of things is because that means competing with IBM. That is difficult. IBM is a monster and offers 'total solutions' were hardware is just one small factor in a lot of things. It's not even really that expensive anymore compared to what it used to cost to get stuff done.

  7. #57
    Join Date
    Nov 2011
    Posts
    24

    Default

    Quote Originally Posted by kebabbert View Post
    Great. It seems that we are settled:

    Those Itanium cpu SGI Altix servers are not SMP systems. Instead, they are ccNUMA systems,
    Yes; I agree. As a minor point, the current x86-based Altix UV systems are also CC-NUMA; except for the change in processors and an upgraded interconnect they are basically the same as the old Itanium Altix systems.

    i.e. basically a cluster.
    Well, for some fairly stretched definition of a cluster, I suppose? If you so fervently want to argue this, could you start by providing a definition for a cluster?

    That is why we will never see them as 4096 core Database servers.
    A more relevant restriction, probably, is that you won't find RDBMS software capable of scaling to those core counts nor useful RDBMS workloads to take advantage of it. Doesn't really matter if the OS and HW scales to a million cores (hypothetically speaking), if the application software doesn't.

    From that wikipedia article: "The addition of virtual memory paging to a cluster architecture can allow the implementation of NUMA entirely in software where no NUMA hardware exists. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater than that of hardware-based NUMA.". You might want to think about that next time you're about to post your ScaleMP quote as some kind of proof that Linux and/or Altix "doesn't scale" (or whatever your point with that quote is, I'm not really sure).

    I have seen numbers of >50x slower RAM latency with NUMA systems.
    FWIW, SGI claims that for the largest supported Altix UV configuration local latency is 75 ns, worst case remote latency 1 us, giving a NUMA factor of about 13.

    When you take a cluster, where some memory cells are very far away from a node, and pretend them to be one single image via NUMA - you get very slow RAM latency in worst case. In some case, RAM latency is good (if the memory cell happens to be close to the cpu) and in other cases the RAM latency is very bad (if the memory cell is in another PC on the network). This can wreak havoc with performance.
    Thanks for showing that you understand the basics of NUMA.

    The implication of this is that you need an OS as well as userspace software that takes NUMA into account. The bigger the NUMA factor is, the worse is the penalty for not getting this right.

    On the other hand, the Oracle M9000 Solaris server with 64cpus, have a latency of approximately 1.3x which is very good. Thus, NUMA is not needed on these M9000 SMP servers.
    Indeed, a factor of 1.3 is quite good. Suspiciously good in fact, considering that's about the factor you have on a modern 2 socket x86 system. So lets look into it: http://www.oracle.com/technetwork/ar...ons-163845.pdf

    So on page 14 we have information about the memory latency. On a 64 socket M9000, you have 532 ns for accessing the memory furthest away from the core, which is, I suppose reasonable for a system of that size. For comparison, the worst case latency on a 256 socket Altix UV is about twice that (see above), which again is quite reasonable since there's bound to be a few more router hops with that many sockets (and different network topology). But look at the local memory latency: 437 ns! Ouch. Simply, ouch. Again, for comparison, on the Altix UV local memory latency is 75 ns, which is a relatively small penalty compared to a simple 2S x86 machine without any directory overhead and, obviously, a pretty small snoop broadcast domain. So we see that the M9000 manages to have a relatively small NUMA factor not by having some super-awesome technology making remote memory access fast, but mostly by having terrible local memory latency. Not exactly something to brag about, eh.

    He also claims that quite recently, in 2007, Linux had a very bad NUMA API support. NUMA was hardly implemented at all in Linux:
    http://kevinclosson.wordpress.com/20...ctl8-and-suma/
    "The NUMA API for Linux is very rudimentary compared to the boutique features in legacy NUMA systems like Sequent DYNIX/ptx and SGI IRIX, but it does support memory and process placement".

    But SGI sold their their Altix systems in 2003. How bad was the NUMA API back then? NUMA could not have worked well for Altix Itanium back then. Only for some work loads that could use the beta phase NUMA API, the Altix cluster would have worked.
    Apparently it worked well enough for SGI to phase out Irix in favor of Linux.

    There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.
    If Linux lack of technical capability is the only thing that is lacking, as you claim, why don't we have a big bunch of companies using opensolaris (or illumos or whatever it's called this week) and competing successfully with the likes of IBM, HP, and Oracle in the high-end RDBMS server market? Don't they want to be billionares, to use your own argument?

  8. #58
    Join Date
    Apr 2008
    Location
    Saskatchewan, Canada
    Posts
    462

    Default

    Quote Originally Posted by drag View Post
    Actually that is were the term 'Slowaris' came from. It's not from the network stack per say, is that when you take old Sparc systems and replaced the OS with Linux they ran much faster.
    Not neccesarily just Linux: I remember when Solaris first came out it was considered a significant downgrade from SunOS, both in performance and stability.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •