Announcement

Collapse
No announcement yet.

AMD Shanghai Opteron: Linux vs. OpenSolaris Benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Actually, to me this issue about scalability is not important at all. I have a Quad core. For that Linux will do great. Linux will do great for any machine up to (and probably more than) 8 CPUs which is more than I ever will lay my hands on. So, this talk about a kernel being scalable to SUNs new upcoming 2048 threads machine - is nothing I ever will notice. Suffice to say, both OS will do great in terms of scalability for any normal machines we can ever come across. And when 8 CPUs machines will be common, the Linux kernel developers have moved Linux boundaries to 32-64 CPUs, which we as normal users never will touch for a very long time. And when we do, the Linux boundaries have moved to 128-256 CPUs. etc. Therefore, this is not really important. Other factors than scalability should decide when choosing OS instead, I think.




    Regarding if only a single (Solaris) binary to be considered as scalable or if modifying the (Linux) code to suit different machines to be considered as scalable. If you need to recompile and modify, then to me it is not scalable - because - I want to learn only one technology and one behaviour. If the different Linux kernels differ much, then I have to learn about them each. "Oh, now I am at cluster, then I must remember these things.... Oh, now I am at a desktop PC, then I must remember these things.... etc". Not only do you have to learn the different kernel versions, but within the same kernel version the kernels can behave differently and have different weak points and strong points. This messes my poor head. In Solaris, I only learn one kernel. And it works for whatever I want to do. I want to learn one technique very well and become a guru on it. It is easier for my poor head and memory. This is the reason, it is important to me personally to have only one kernel with only one behaviour no matter what you do.

    I remember, I learned C64, then Amiga, MS-DOS, Win3.11, Win95, WinNT, WinXP. I am tired of all these techniques I learn, that they die a couple of years later. I must relearn each time. If I instead had poured all these countless hours of learning each OS into one and only, like Unix who has existed for 40 years - I would have become an expert. Now I stick with Unix and dont have to relearn from scratch all the time. Of course I have to update my knowledge, but that is less work than relearn from scratch. Going from Windows to FreeBSD -> relearn from scratch. Going from Unix to Linux -> no need to relearn from scratch, only update pieces of your knowledge.

    Comment


    • #82
      Originally posted by kebabbert View Post
      Other factors than scalability should decide when choosing OS instead, I think.
      Heh, yeah, good call.

      Regarding if only a single (Solaris) binary to be considered as scalable or if modifying the (Linux) code to suit different machines to be considered as scalable. If you need to recompile and modify, then to me it is not scalable - because - I want to learn only one technology and one behaviour. If the different Linux kernels differ much, then I have to learn about them each. "Oh, now I am at cluster, then I must remember these things.... Oh, now I am at a desktop PC, then I must remember these things.... etc". Not only do you have to learn the different kernel versions, but within the same kernel version the kernels can behave differently and have different weak points and strong points. This messes my poor head. In Solaris, I only learn one kernel. And it works for whatever I want to do. I want to learn one technique very well and become a guru on it. It is easier for my poor head and memory. This is the reason, it is important to me personally to have only one kernel with only one behaviour no matter what you do.
      Ok, so you're talking about learning to tune Linux for whatever workload. From the way you describe it, you should be unhappy with Solaris, since it can switch process schedulers at run time, so you have to learn how both of them behave, and interact with any other tunables and selectable versions of algorithms, right?

      None of Linux's compile-time config options will require learning new behaviour, since they don't change behaviour (except SLUB vs. SLAB). The harder thing to learn is all the run-time tunable variables and switches. (e.g. vm/swappiness, which many people disagree on good settings for...) If you like, think of the config options as just more tunables, which happen to be compile-time not run-time. Back in the day, a lot more tunables were compile time. That made sense when CPU instruction throughput was way more of a bottleneck compared to memory, and when caches were smaller. (i.e. before P6)

      I just recently customized a kernel for my desktop, so I went and looked over the menu options in make xconfig (for amd64) and there aren't any alternate algorithms you can choose for big iron. What you'd actually do:

      0. do some tuning on the stock kernel from your distro, unless it is totally unsuitable for your machine and what you learn from it would be useless. (e.g. Ubuntu's amd64 configs don't enable this, surprisingly. I guess they don't expect to get any use on dual-socket K8? weird. I'm going to file a bug report if it hasn't already come up. -server should have NUMA support.)

      1. choose the actual CPU type (core2, Opteron, or P4 instead of generic x86-64). I think this mostly affects what gcc options are used.
      2. enable NUMA memory allocation/scheduler awareness. Enable support for anything your machine has that isn't enabled by default.
      3. Turn off all accounting, debugging, and statistics that you don't need/want for tuning or to leave available in production.
      4. disable device drivers you don't need. This isn't a big deal, since the default config has almost everything built as modules. Memory consumption from dead code (core framework support for e.g. webcams) doesn't matter. It won't get any action, so it won't pollute your I-cache, which is what does matter. Maybe disabling some parts of kernel infrastructure will improve code locality by taking out dead code between code that will see some use.

      5. compile and boot the kernel and start tweaking /proc/sys tunables.

      6. Turn off all the stats gather stuff you don't need anymore, and build a production kernel. Although it's probably pretty easy to not do your stats gathering inside critical sections, so compiling it in shouldn't worsen scalability, just provide a tiny non-#CPU-dependent slowdown. And a bigger slowdown when stats gather is actually enabled with it's /proc/sys controls. (Most expensive stats gathering has to be enabled at runtime as well as compiled in.)

      BTW, I lied. There is one compile-time algorithm selection: SLUB or SLAB kernel memory allocator. SLAB is the old one, and they keep it around because it's "known not to work well in all environments". So if your workload hits a corner case in SLUB, you can use SLAB. It's not that only one of them is intended for big iron, though. The only other selectable algorithm of importance is I/O scheduler. You can set the default at compile time, but it's always selectable at runtime. Plus each scheduler has runtime tunable integer variables. (There are also a few unimportant (in this case) selectable algorithms, e.g. choice of CPU frequency governor: ondemand vs. conservative vs. userspace.)

      Comment


      • #83
        Ok, that Linux config sounds reasonable. Not too many complex tunings.

        I should rephrase my issue a bit. If I sit at a Linux cluster, that kernel will behave differently compared to a std Linux kernel. It will have different functionality compiled in/out. It is like using v2.4 kernel or v2.6 - there will be differences. Me personally, I dont like the need to remember special cases. I want a big general theory, a big general theorem that reduce to all special cases. But I want to learn only one theorem, rather than several similar looking theorems. One big theory that covers all the special cases is mathematically more elegant than having to develop theory (code) for each special case.

        It is like:
        -Oh, you want to go through the forest (cluster)? Then you must use this vehicle X with it's driver license. Oh, you want to travel on roads (desktopPC)? Then you must use this vehicle Y with it's driver license. Oh, you want to travel in narrow streets in cities (embedded)? Then you must use vehicle Z, etc. etc. One driver license for each different task.

        With Solaris, it is like: Use this vehicle, which handles every case and use this driver license. Period. It is like GUT, grand unified theory, in physics. You dont have to recompile to different tasks. Solaris handles all tasks well.

        Or like, I travel and have to bring with me lots of different currencies for each country I visit. I prefer to only bring with me USD. Or when Microsoft sells 42 different versions of Windows; home, premium, pro, server, ultimate, etc etc etc and Apple sells only one version of OS X. Which Windows version should I choose and learn? All these special cases messes up my head.

        Regarding the different Solaris schedulers you can choose between, I dont see that as a problem. Because they are not something I am forced to use. I simply ignore them and use the default scheduler. If I go up to Enterprise level, then I have to choose scheduler and other fine tunings. Tuning in Solaris amounts to change different arguments in a config file. No recompilation.

        I believe it is easier to learn a few schedulers behaviour than different kernels? "This is server scheduler" vs "this is desktop scheduler". One optimizes for I/O, the other for responiveness. Kind of.

        Comment


        • #84
          I think this needs to be reposted again amidst the noise to emphasize that whatever this benchmark is doing, it's NOT comparing operating systems -- it's comparing compilers.

          Originally posted by flice View Post
          OK, so I took the time and benchmarked several of the tests as they appear in the article. The benchmarks were done inside VirtualBox VMs running in Ubuntu 8.10. One VM was OpenSolaris x64 2008.11, the other was Ubuntu x64 8.10. All the OpenSolaris programs were compiled with flags taken from here.

          So here's what I have:
          Code:
                                     SOLARIS         LINUX
          
          lame                       23.17           23.02
          oggenc                     23.51           19.11
          GraphicsMagic - HWB        74              50
          GraphicsMagic - LAT        17              10
          GraphicsMagic - resize     47              33
          GraphicsMagic - sharpen    20              10
          As you see, all the results are much better for OpenSolaris, than in the Phoronix article. Strikingly, the trend in GraphicsMagic tests is reversed compared to the article: Solaris not only doesn't suck, but wins with x1.5 up to x2.0 better results. One thing to note is that VirtualBox only allows one CPU core per VM, so the results might be different in GraphicsMagic tests.

          Overall, the article results seem to be completely detached from reality.

          Originally posted by flice View Post
          Yes, I was using the Sun Studio compiler. Here are the results from gcc 3.4:

          Code:
          lame                       23.75
          oggenc                     29.10
          GraphicsMagic - HWB        40
          GraphicsMagic - LAT        9
          GraphicsMagic - resize     30
          GraphicsMagic - sharpen    14
          BTW, my machine is Intel c2d, not AMD.

          Comment


          • #85
            Originally posted by kebabbert View Post
            With Solaris, it is like: Use this vehicle, which handles every case and use this driver license. Period. It is like GUT, grand unified theory, in physics. You dont have to recompile to different tasks. Solaris handles all tasks well.
            Yes, I'm not sure if people realize this stark difference in philosophy between Linux and Solaris regarding tuning. In general, Solaris tries to reduce the need for tuning whenever possible and when you talk to the engineers, they always suggest tuning (kernel variables, etc) to be used as a last resort, or to help with debugging/analysis. I'm not sure if you can say which approach is *always* better because there are pros and cons to both.

            Comment


            • #86
              Originally posted by trasz View Post
              Actually, the better way would be the other way around. The nice thing Sun could use from Linux are device drivers - and pretty much nothing more. In particular, the "core" in OpenSolaris is years ahead of Linux, which still uses anachronic synchronisation model based on spinlocks (which nobody uses anymore, except probably Windows - other operating systems are based on fully functional mutexes and interrupt threads), VFS working backwards (file-based instead of vnode-based) etc.
              Famous troll is back. Linux is using mutexes. You want me to believe that Linux drivers are most important things in HPC and other areas? Does FreeBSD or OpenBSD use mutexes? I saw your trolling for years on some portals :> Why the hell OpenSolaris hung for about 10 seconds when I clicked on Firefox icon (in Sun's vbox and other systems work like a harm in it)? It seems in this case mutexes aren't helpfull. Can you give me some proofs?

              2006:

              Thirty years ago, Linus Torvalds was a 21 year old student at the University of Helsinki when he first released the Linux Kernel. His announcement started, “I’m doing a (free) operating system (just a hobby, won't be big and professional…)”. Three decades later, the top 500 supercomputers are all running Linux, as are over 70% of all smartphones. Linux is clearly both big and professional.


              Dive into our comprehensive guide to understanding router login processes, IP addresses like 192.168.1.1, 10.0.0.1, and more. Learn how to access and manage your router's settings, check your private IP, and optimize your network using our easy step-by-step guide.



              @npcomplete

              Of course. It's the same as Phoronix one...

              EDIT:

              It seems that *BSD are using mutexes too.
              Last edited by kraftman; 13 February 2009, 10:18 AM.

              Comment


              • #87
                Yeah, the results seem more accurate now, it's just a kernel can't make such a huge difference as in the results Phoronix posted.

                BTW, can you try adding another flag "-fast -m64", as otherwise just "-fast" builds 32 bit-binaries, while in Ubuntu x64 you'd get 64 binaries by default. This should further improve the scores you got.

                7-zip, gzip, gnupg scores should also change drastically, and if they would've compiled GraphicsMagic LAP with actual OpenMP support (they did so for Linux, but didn't for OpenSolaris: -xopenmp on Sun Studio) they would be in for a surprise.
                But I guess the numbers wouldn't be as interesting for "[Phoronix] Linux Hardware Reviews, Benchmarking, & Gaming", it's better to compare the performance of one processor vs 8 and then say OpenSolaris was "slaughtered".

                Originally posted by flice View Post
                OK, so I took the time and benchmarked several of the tests as they appear in the article. The benchmarks were done inside VirtualBox VMs running in Ubuntu 8.10. One VM was OpenSolaris x64 2008.11, the other was Ubuntu x64 8.10. All the OpenSolaris programs were compiled with flags taken from here.

                So here's what I have:
                Code:
                                           SOLARIS         LINUX
                
                lame                       23.17           23.02
                oggenc                     23.51           19.11
                GraphicsMagic - HWB        74              50
                GraphicsMagic - LAT        17              10
                GraphicsMagic - resize     47              33
                GraphicsMagic - sharpen    20              10
                As you see, all the results are much better for OpenSolaris, than in the Phoronix article. Strikingly, the trend in GraphicsMagic tests is reversed compared to the article: Solaris not only doesn't suck, but wins with x1.5 up to x2.0 better results. One thing to note is that VirtualBox only allows one CPU core per VM, so the results might be different in GraphicsMagic tests.

                Overall, the article results seem to be completely detached from reality.
                Last edited by etacarinae; 13 February 2009, 10:59 AM.

                Comment


                • #88
                  Originally posted by etacarinae View Post
                  Yeah, the results seem more accurate now, it's just a kernel can't make such a huge difference as in the results Phoronix posted.
                  Still no if different compilers and flags were used. Overall, those results seem to be completely detached from reality. :P I can do tests myself and we'll see.
                  Last edited by kraftman; 14 February 2009, 05:44 AM.

                  Comment


                  • #89
                    Originally posted by kebabbert View Post
                    Ok, that Linux config sounds reasonable. Not too many complex tunings.

                    I should rephrase my issue a bit. If I sit at a Linux cluster, that kernel will behave differently compared to a std Linux kernel. It will have different functionality compiled in/out. It is like using v2.4 kernel or v2.6 - there will be differences. Me personally, I dont like the need to remember special cases.
                    The kinds of things you can leave out of the Linux kernel are the kinds of things that only matter to root. e.g. maybe you leave out support for being a paravirtualized guest under Xen, or process accounting, or kernel latency measuring code. Then yes, that specialized bit of functionality won't be available. So don't leave out functionality you actually plan to use! It's not hard. Linux doesn't make it easy to leave out various system calls willy-nilly or anything. The stuff I'm talking about leaving out provides obscure stuff burried deep in /proc and /sys, which some specialized user-space programs know how to talk to.

                    So no, it won't behave differently. If you want to just use the machine, for something other than profiling/tuning the kernel, you will most likely never notice the difference caused by different kernels. On a well tuned kernel, maybe you can create more threads before it starts to slow down too much, but that's all. It won't be qualitatively different.

                    It is like:
                    -Oh, you want to go through the forest (cluster)? Then you must use this vehicle X with it's driver license. Oh, you want to travel on roads (desktopPC)? Then you must use this vehicle Y with it's driver license.
                    You're definitely going to notice the difference in user-space between a cluster and a desktop, though. A cluster will have something like grid engine to submit jobs to, so they will be executed on whatever cluster node has a free slot. (why are we talking about clusters, again? This doesn't apply to single-system-image big iron.)

                    BTW, many people feel lost in the forest/wilderness when they first start looking at Sun GridEngine docs. So your analogy is apt. It's not that complicated if they'd only explain how the pieces fit together, instead of trying to give you recipes that you can't adapt for your own stuff if you don't understand how it all works. It took me months to get a handle on gridengine, but then I was able to explain the key points to the users of my cluster in a few minutes.


                    Tuning in Solaris amounts to change different arguments in a config file. No recompilation.
                    Same for Linux, you tune the tunables by editting /proc/sysctl.conf. But you can also build a kernel optimized for your specific hardware. Now that I've looked at the available compile-time config options, it's not what I'd call tuning, it's just specializing/optimizing.

                    I believe it is easier to learn a few schedulers behaviour than different kernels?
                    And that's where you're mistaken. It's still Linux. The whole point of Linux is to be Linux even when built for a small system or big iron. Embeded Linux is popular in part because it's the same Unixy kernel everyone knows from desktop dev experience. They don't have to learn a whole new system. (almost) All the syscalls work exactly the same no matter how you build Linux; you can use the same user-space across the whole range of systems that Linux supports (well, subject to memory constraints, of course).

                    Maybe you should clarify what you're talking about having to learn that's kernel-dependent, and not part of the user-space tools. I mean, you could leave out /proc support in the kernel, but then you couldn't do much with it. So you don't have to worry that e.g. /proc/PID/whatever won't be there on a different kernel.

                    Comment


                    • #90
                      What do you think about several theorems that tell almost the same thing, or one big theorem that solves all cases? Which do you prefer? Several theorems that are used depending on different situations, or one theorem that is always used?

                      Comment

                      Working...
                      X