AMD Shanghai Opteron: Linux vs. OpenSolaris Benchmarks

kraftman replied

17 February 2009, 03:31 PM
Originally posted by kebabbert View Post

Solaris choking on desktop computers, are you refering to when you tried Solaris in VirtualBox and it paused for 10 secs? You know, in my humble opinion, it is wrong to draw that conclusion you do; that Solaris is slow. I have never had Solaris pause for 10 secs, and Ive ran Solaris for 10 years or so. Ive also never seen Solaris crash. Never seen it happen. The pausing could have happened because of VirtualBox, you never thought of that, did you? You know, VirtualBox is not the most stable product. Especially with solaris involved, VB doesnt work too well. On my laptop, OpenSolaris as a guest in VB, takes 3-4 minutes to move the mouse pointer one inch. I should not say "Solaris is dog slow". If I did, it would be wrong.

And your links, the "Linux definition of scalability" I dont agree upon them. So you agree that C64 is scalable, right? I can run it on wristwatches to supercomputers, I just have to reprogram the whole kernel each new machine. If you ask me, that is not scalability. C64 is not scalable.

Any system I tried worked perfectly in vbox, but there's another reason why I said so :> It seems that some of you are trying to move "battlefield" away from Solaris You can run C64 on supercomputers under emulator not natively (and C64 isn't comparable to any modern OS). The point is that Linux runs natively. Btw. what I see there are at least two definitions of scalability and probably both are correct.

P.S. GNU/Solaris (Open Solaris) is quite interesting, but I don't understand why Sun, Open Solaris makers don't compile it with recommended flags? They should release optimized x86_64 version in my opinion. Phoronix wouldn't be cheating so much then
Leave a comment:
kebabbert replied

17 February 2009, 02:18 PM
Originally posted by kraftman View Post

In what? This is scalability:

Scalable definition by The Linux Information Project

http://www.linfo.org/scalable.html

http://www.research.ibm.com/WearableComputing/linuxwatch/linuxwatch.html

Solaris/Open Solaris isn't scalable and it chokes even on desktop computers (anyone tried to run it on a watch? xd). Maybe that is the reason I think that they call it slowlaris.

Solaris choking on desktop computers, are you refering to when you tried Solaris in VirtualBox and it paused for 10 secs? You know, in my humble opinion, it is wrong to draw that conclusion you do; that Solaris is slow. I have never had Solaris pause for 10 secs, and Ive ran Solaris for 10 years or so. Ive also never seen Solaris crash. Never seen it happen. The pausing could have happened because of VirtualBox, you never thought of that, did you? You know, VirtualBox is not the most stable product. Especially with solaris involved, VB doesnt work too well. On my laptop, OpenSolaris as a guest in VB, takes 3-4 minutes to move the mouse pointer one inch. I should not say "Solaris is dog slow". If I did, it would be wrong.

And your links, the "Linux definition of scalability" I dont agree upon them. So you agree that C64 is scalable, right? I can run it on wristwatches to supercomputers, I just have to reprogram the whole kernel each new machine. If you ask me, that is not scalability. C64 is not scalable.
Leave a comment:
kraftman replied

17 February 2009, 01:11 PM
Originally posted by kebabbert View Post

Maybe that is the reason I think that one Solaris kernel is prefered, before 42 different Linux kernels depending on the task you are trying to solve.

In what? This is scalability:

Scalable definition by The Linux Information Project

http://www.linfo.org/scalable.html

http://www.research.ibm.com/WearableComputing/linuxwatch/linuxwatch.html

Solaris/Open Solaris isn't scalable and it chokes even on desktop computers (anyone tried to run it on a watch? xd). Maybe that is the reason I think that they call it slowlaris.
Leave a comment:
kebabbert replied

17 February 2009, 07:22 AM
Originally posted by llama View Post

I would say model, not theorem. I understand how things work in terms of a mental model that's like a simple simulator that I predict their behaviour with. I think that's what you're trying to get at. I don't quite know what "solve" is an analogy for when it comes to understanding operating systems, though.

Different size systems have different bottlenecks, and sometimes you only have to worry about a few of them. I'm trying to think back to things I've done that required understanding kernel behaviour. In the cases I'm thinking of, it's always been some small aspect of the kernel that mattered for what I was doing. So I'd say I have different models of the different parts of the system. If I needed to, I could think about how those pieces fit together to unstand how Linux as a whole works (when running on hardware I know enough about, which for me these days is just AMD64 PCs. I have a vague idea of how highmem on ia32 works, but it sounded horrible so I put that on my list of good reasons to use amd64 and wish ia32 would curl up and die.)

On smaller machines, you have to know e.g. does grep -r on a big source tree start to make your desktop swap out, so the programs you have open are slow while they page back in again. (the answer is yes if vm.swappiness is set to 60 (a common default), so set it to more like 20 or 30 if you run e.g. bittorrent on a desktop.) http://www.sabi.co.uk/blog/ has some good comments about Linux's VM, and GNU/Linux desktops, being written by well-funded devs with their big fancy machines, and how GNU/Linux has serious weaknesses on memory-constrained systems. If you have plenty of RAM (relative to what you're doing), you don't have to care about lots of VM behaviour.

Another aspect of Linux that I remember figuring out was when I wanted to run cycle-accurate benchmarks of a routine I was optimizing. (http://gmplib.org/list-archives/gmp-...ch/000789.html) I ran my benchmark loop at realtime priority, so it would have 100% of a CPU core. When Linux scheduled it on the same CPU that handled keyboard interrupts, it froze the system until it was done. I found out that Linux (on my system, check your own /proc/interrupts) handles all interrupts on CPU0, and you can give a process all of a CPU without breaking your system by using taskset 2 chrt, to put it on the other CPU core. So for this I had to think only about how Linux's scheduler and interrupt handling worked (on amd64 core2duo). I didn't have to think about the network stack, the VM, the VFS, or much else.

Another time I was curious how Linux decides whether to send a TCP ACK by itself, or let a data packet ACK receipt of packets coming the other way on a TCP stream with data flowing in both directions. I never dug in enough to find out why it decides to sometimes send empty ACK packets, but not always. This behaviour isn't (AFAIK) connected to the scheduler, VM, or much outside the network stack.

So a lot of the things that are coming to mind that I've wanted to know about Linux have been possible to understand in isolation. Which sounds to me like your "multiple theories". But if by situation you mean workload and machine type, not what part of the kernel you're trying to understand, then I think I tend to understand things in enough detail that those things would be parameters in my mental model, so it's really the same model over all conditions for whatever part of the kernel I'm trying to grok.

I operate by delving into the details. I love details. (and, since I have ADHD, I have a hard time not getting wrapped up in details, as everyone can probably tell from my posts!) At the level of detail I like to understand, a complete theory of how Linux behaves on a whole range of hardware would be more than I could keep in my head.

This is maybe why I never saw eye to eye with on your wish for a complete theory that you could just remember that would tell you everything about how Linux worked. I didn't really say anything before, because I couldn't think of a polite way to say that didn't make any sense to me.

Details are important. Yes, that is true. I have a double Master's degree in Math and another in Computer Science (algorithm theory). All this math has taught me that if you have several theorems that behave almost similar, then you can abstract and make them into one theorem. If you can not, then that theory is inferior and needs to be altered into something more general. Maybe that is the reason I think that one Solaris kernel is prefered, before 42 different Linux kernels depending on the task you are trying to solve. You know, different tools for different tasks is NOT scalability. You can never state Linux kernel is scalable, when you need to use different versions. Solaris install DVD is the same, no matter which machine. THAT is scalability. It is not something we have to agree on, or disagree on. It is a fact. Solaris is scalable, Linux is not. Otherwise, I could equivally say "C64 is scalable"; I just have to modify it. That is simply plain stupid to say so. It is nothing to agree upon or not, it is stupid to say so.

But certainly you havent studied much math, so you dont understand what I am talking about or why I emphasize that all the time. "But I couldnt think of a polite way of saying that". If you want to get sticky, we can.

Last edited by kebabbert; 17 February 2009, 07:24 AM.
Leave a comment:
kraftman replied

16 February 2009, 06:38 PM
Originally posted by trasz View Post

Yeah, nice to meet you. :->

Nice to meet you too

What I was talking about was synchronisation in the kernel. What you're talking about above is synchronisation between userland threads. Two completely unrelated things. The fact that Linux uses spinlocks is one of the reasons that its performance drops noticeably under high load on many CPUs. Other operating systems use fully functional mutexes, along with interrupt threads.

I think they converted spinlocks to mutexes even it this area:

LKML: Ingo Molnar: Re: [patch 00/15] Generic Mutex Subsystem

http://lkml.org/lkml/2005/12/19/80

LKML: Matthew Wilcox: [DOC PATCH] Remove mention of semaphores from kernel-locking

http://lkml.org/lkml/2008/4/21/279

Is this what you said based on some articles or you spent a while on searching lkml? :> Aren't you talking about problem with crappy malloc library?

In this article is mentioned about pthreads mutex (it's 2000) and RTLinux:

http://mae.pennnet.com/display_article/66639/32/ARTCL/none/none/1/Linus-Torvalds-gives-the-military-a-new-secret-weapon/

Mutexes are important for RT aren't they?

Last edited by kraftman; 16 February 2009, 06:50 PM.
Leave a comment:
trasz replied

16 February 2009, 10:50 AM
Originally posted by kraftman View Post

Famous troll is back.

Yeah, nice to meet you. :->

Originally posted by kraftman View Post

Linux is using mutexes. You want me to believe that Linux drivers are most important things in HPC and other areas? Does FreeBSD or OpenBSD use mutexes? I saw your trolling for years on some portals :> Why the hell OpenSolaris hung for about 10 seconds when I clicked on Firefox icon (in Sun's vbox and other systems work like a harm in it)? It seems in this case mutexes aren't helpfull. Can you give me some proofs?

2006:

An Interview With Linus Torvalds: Linux and Git - Part 1 30 Years Of Linux

http://kerneltrap.org/node/6019

Thirty years ago, Linus Torvalds was a 21 year old student at the University of Helsinki when he first released the Linux Kernel. His announcement started, “I’m doing a (free) operating system (just a hobby, won't be big and professional…)”. Three decades later, the top 500 supercomputers are all running Linux, as are over 70% of all smartphones. Linux is clearly both big and professional.

Understanding Router Login & IP Addresses | Your Networking Essentials

http://www.comptechdoc.org/os/linux/programming/c/linux_pgcmutex.html

Dive into our comprehensive guide to understanding router login processes, IP addresses like 192.168.1.1, 10.0.0.1, and more. Learn how to access and manage your router's settings, check your private IP, and optimize your network using our easy step-by-step guide.

What I was talking about was synchronisation in the kernel. What you're talking about above is synchronisation between userland threads. Two completely unrelated things. The fact that Linux uses spinlocks is one of the reasons that its performance drops noticeably under high load on many CPUs. Other operating systems use fully functional mutexes, along with interrupt threads.
Leave a comment:
Peter_Cordes replied

16 February 2009, 01:06 AM
Originally posted by kebabbert View Post

What do you think about several theorems that tell almost the same thing, or one big theorem that solves all cases? Which do you prefer? Several theorems that are used depending on different situations, or one theorem that is always used?

I would say model, not theorem. I understand how things work in terms of a mental model that's like a simple simulator that I predict their behaviour with. I think that's what you're trying to get at. I don't quite know what "solve" is an analogy for when it comes to understanding operating systems, though.

Different size systems have different bottlenecks, and sometimes you only have to worry about a few of them. I'm trying to think back to things I've done that required understanding kernel behaviour. In the cases I'm thinking of, it's always been some small aspect of the kernel that mattered for what I was doing. So I'd say I have different models of the different parts of the system. If I needed to, I could think about how those pieces fit together to unstand how Linux as a whole works (when running on hardware I know enough about, which for me these days is just AMD64 PCs. I have a vague idea of how highmem on ia32 works, but it sounded horrible so I put that on my list of good reasons to use amd64 and wish ia32 would curl up and die.)

On smaller machines, you have to know e.g. does grep -r on a big source tree start to make your desktop swap out, so the programs you have open are slow while they page back in again. (the answer is yes if vm.swappiness is set to 60 (a common default), so set it to more like 20 or 30 if you run e.g. bittorrent on a desktop.) http://www.sabi.co.uk/blog/ has some good comments about Linux's VM, and GNU/Linux desktops, being written by well-funded devs with their big fancy machines, and how GNU/Linux has serious weaknesses on memory-constrained systems. If you have plenty of RAM (relative to what you're doing), you don't have to care about lots of VM behaviour.

Another aspect of Linux that I remember figuring out was when I wanted to run cycle-accurate benchmarks of a routine I was optimizing. (http://gmplib.org/list-archives/gmp-...ch/000789.html) I ran my benchmark loop at realtime priority, so it would have 100% of a CPU core. When Linux scheduled it on the same CPU that handled keyboard interrupts, it froze the system until it was done. I found out that Linux (on my system, check your own /proc/interrupts) handles all interrupts on CPU0, and you can give a process all of a CPU without breaking your system by using taskset 2 chrt, to put it on the other CPU core. So for this I had to think only about how Linux's scheduler and interrupt handling worked (on amd64 core2duo). I didn't have to think about the network stack, the VM, the VFS, or much else.

Another time I was curious how Linux decides whether to send a TCP ACK by itself, or let a data packet ACK receipt of packets coming the other way on a TCP stream with data flowing in both directions. I never dug in enough to find out why it decides to sometimes send empty ACK packets, but not always. This behaviour isn't (AFAIK) connected to the scheduler, VM, or much outside the network stack.

So a lot of the things that are coming to mind that I've wanted to know about Linux have been possible to understand in isolation. Which sounds to me like your "multiple theories". But if by situation you mean workload and machine type, not what part of the kernel you're trying to understand, then I think I tend to understand things in enough detail that those things would be parameters in my mental model, so it's really the same model over all conditions for whatever part of the kernel I'm trying to grok.

I operate by delving into the details. I love details. (and, since I have ADHD, I have a hard time not getting wrapped up in details, as everyone can probably tell from my posts!) At the level of detail I like to understand, a complete theory of how Linux behaves on a whole range of hardware would be more than I could keep in my head.

This is maybe why I never saw eye to eye with on your wish for a complete theory that you could just remember that would tell you everything about how Linux worked. I didn't really say anything before, because I couldn't think of a polite way to say that didn't make any sense to me.
Leave a comment:
kebabbert replied

15 February 2009, 06:46 AM
What do you think about several theorems that tell almost the same thing, or one big theorem that solves all cases? Which do you prefer? Several theorems that are used depending on different situations, or one theorem that is always used?
Leave a comment:
Peter_Cordes replied

13 February 2009, 11:40 AM
Originally posted by kebabbert View Post

Ok, that Linux config sounds reasonable. Not too many complex tunings.

I should rephrase my issue a bit. If I sit at a Linux cluster, that kernel will behave differently compared to a std Linux kernel. It will have different functionality compiled in/out. It is like using v2.4 kernel or v2.6 - there will be differences. Me personally, I dont like the need to remember special cases.

The kinds of things you can leave out of the Linux kernel are the kinds of things that only matter to root. e.g. maybe you leave out support for being a paravirtualized guest under Xen, or process accounting, or kernel latency measuring code. Then yes, that specialized bit of functionality won't be available. So don't leave out functionality you actually plan to use! It's not hard. Linux doesn't make it easy to leave out various system calls willy-nilly or anything. The stuff I'm talking about leaving out provides obscure stuff burried deep in /proc and /sys, which some specialized user-space programs know how to talk to.

So no, it won't behave differently. If you want to just use the machine, for something other than profiling/tuning the kernel, you will most likely never notice the difference caused by different kernels. On a well tuned kernel, maybe you can create more threads before it starts to slow down too much, but that's all. It won't be qualitatively different.

It is like:
-Oh, you want to go through the forest (cluster)? Then you must use this vehicle X with it's driver license. Oh, you want to travel on roads (desktopPC)? Then you must use this vehicle Y with it's driver license.

You're definitely going to notice the difference in user-space between a cluster and a desktop, though. A cluster will have something like grid engine to submit jobs to, so they will be executed on whatever cluster node has a free slot. (why are we talking about clusters, again? This doesn't apply to single-system-image big iron.)

BTW, many people feel lost in the forest/wilderness when they first start looking at Sun GridEngine docs. So your analogy is apt. It's not that complicated if they'd only explain how the pieces fit together, instead of trying to give you recipes that you can't adapt for your own stuff if you don't understand how it all works. It took me months to get a handle on gridengine, but then I was able to explain the key points to the users of my cluster in a few minutes.

Tuning in Solaris amounts to change different arguments in a config file. No recompilation.

Same for Linux, you tune the tunables by editting /proc/sysctl.conf. But you can also build a kernel optimized for your specific hardware. Now that I've looked at the available compile-time config options, it's not what I'd call tuning, it's just specializing/optimizing.

I believe it is easier to learn a few schedulers behaviour than different kernels?

And that's where you're mistaken. It's still Linux. The whole point of Linux is to be Linux even when built for a small system or big iron. Embeded Linux is popular in part because it's the same Unixy kernel everyone knows from desktop dev experience. They don't have to learn a whole new system. (almost) All the syscalls work exactly the same no matter how you build Linux; you can use the same user-space across the whole range of systems that Linux supports (well, subject to memory constraints, of course).

Maybe you should clarify what you're talking about having to learn that's kernel-dependent, and not part of the user-space tools. I mean, you could leave out /proc support in the kernel, but then you couldn't do much with it. So you don't have to worry that e.g. /proc/PID/whatever won't be there on a different kernel.
Leave a comment:
kraftman replied

13 February 2009, 11:03 AM
Originally posted by etacarinae View Post

Yeah, the results seem more accurate now, it's just a kernel can't make such a huge difference as in the results Phoronix posted.

Still no if different compilers and flags were used. Overall, those results seem to be completely detached from reality. :P I can do tests myself and we'll see.

Last edited by kraftman; 14 February 2009, 05:44 AM.
Leave a comment:

Announcement

AMD Shanghai Opteron: Linux vs. OpenSolaris Benchmarks

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: