Linux 2.6.38 Kernel Multi-Core Scaling

mtippett replied

07 March 2011, 05:14 AM
Originally posted by V!NCENT View Post

Now if that's not the case then Linux realy sucks balls at scaling...

Interestingly, it looks like the CPU topology allows Linux to scale better with HyperThreading. PC-BSD (FreeBSD) and Illumos (OpenSolaris) both consistently had decreases going from 6 cores to 6 cores + HyperThreading.

But a broad statement about scaling needs to have a context to get some meat. What workloads are you talking about scaling?
Leave a comment:
mtippett replied

07 March 2011, 05:12 AM
Originally posted by V!NCENT View Post

Errrr.... http://en.wikipedia.org/wiki/Giant_lock

Kernel lock: Kernel locks all threads, except one. So only one thread at a time. Removing this kernel lock means that threads still needs to lock, but there is not a total serial thread management going on. Now if you have a single core than no matter what you might hack together; only one process is done at a time anyway.

David is right, the kernel has been progressively removing macro-locks over the last number of years. A few years ago, I know that SGI was looking at the BKL being taken over during ioctls on their multi-pipe GPU systems.

As David has said, different subsystems now have a broad collection of finer grained locking around the kernel calls being made to those subsystems. Removing the BKL will only affect some types of workloads. The workloads that may be affected would absolutely need to be multi-threaded (which further reduces the likelihood of seeing a benefit).

As Michael shows in the benchmark results in this article, the impact to the CPU centric benchmarks is virtually nothing between these kernels.
Leave a comment:
V!NCENT replied

07 March 2011, 04:51 AM
Originally posted by airlied View Post

the BKL hasn't mattered in a long time, removing it was nearly purely symbolic unless you were using one of the last few holdouts. So of course it would have no effect on a benchmark. Not sure where the people who can't be bothered to do any proper research and think they know stuff got the idea that BKL removal would affect any benchmarks.

Errrr.... http://en.wikipedia.org/wiki/Giant_lock

Kernel lock: Kernel locks all threads, except one. So only one thread at a time. Removing this kernel lock means that threads still needs to lock, but there is not a total serial thread management going on. Now if you have a single core than no matter what you might hack together; only one process is done at a time anyway.

Now onto multiple cores; multiple threads at once.

Seems like a very simple conclusion to me?

Now if that's not the case then Linux realy sucks balls at scaling...
Leave a comment:
HokTar replied

06 March 2011, 06:57 PM
Originally posted by mtippett View Post

I'm expecting that it will come. Although I doubt that the scalability testing will be done by the vendors, having results from those systems are fully expected.

Actually I meant to ask them for the hardware so you could run the tests but my phrasing was dubious. Sorry about that.
Leave a comment:
BenderRodriguez replied

06 March 2011, 05:15 PM
Originally posted by kraftman View Post

Weren't those patches aiming at responsiveness?

If i understand that patch right then the responsiveness is improved through process grouping. CFQ would normally allocate CPU resources evenly, for example 9 make instances and 1 of vlc, in that case vlc would get about 10% CPU resources while 9 make instances would get the 90%. With the new patch 9 make instances are allocated cpu resources as a group so 9 make instances would get 50% CPU and vlc would get also 50% CPU (of course if it needs so much). For that to work you need cgroups enabled in the kernel. The patch isn't supposed to get more performance but to evenly spread cpu resources and prevent demanding process to starve.
Leave a comment:
kebabbert replied

06 March 2011, 04:30 PM
Originally posted by devius View Post

Shouldn't this be tested on something with a really big number of cores/processors to be able to see any differences? Something like 48 cores or more? 6 cores isn't all that much, even if they have HT.

Yes you are right. But 48 cores is a bit low too. You can not really talk about true scalability on as few as 48 cores. You need more cores. Scalable means it scales from few cores up to several 100s.
Leave a comment:
airlied replied

06 March 2011, 03:58 PM
the BKL hasn't mattered in a long time, removing it was nearly purely symbolic unless you were using one of the last few holdouts. So of course it would have no effect on a benchmark. Not sure where the people who can't be bothered to do any proper research and think they know stuff got the idea that BKL removal would affect any benchmarks.

here's a link for anyone who really is as useless at research as anyone here.

Removing the big kernel lock. A big deal? at Andi Kleen's blog

http://halobates.de/blog/p/56

Tilting at windmills and other endeavors

As for the 200-line wonder patch, it also has nothing to do with scalability, unless one of the tests is to watch a video while compiling a kernel in a terminal, which is the only case the patch does anything for.
Leave a comment:
kraftman replied

06 March 2011, 03:11 PM
Originally posted by jakubo View Post

comments on the graphs are rare these days on phoronix.com
nothing to tell why the big kernel lock patch and the "patch that does wonders" - as proclaimed - hardly make a change?

Weren't those patches aiming at responsiveness?
Leave a comment:
kraftman replied

06 March 2011, 03:08 PM
Originally posted by devius View Post

Shouldn't this be tested on something with a really big number of cores/processors to be able to see any differences? Something like 48 cores or more? 6 cores isn't all that much, even if they have HT.

The improvements should be even visible even on 4 cores. Thanks for the test Michael. I wonder why there's no difference? Maybe some funny stuff is disabled or something?
Leave a comment:
Smorg replied

06 March 2011, 03:04 PM
IDK, The way Intel likes to never ever lower the prices on their higher-end consumer grade chips (i.e. gulftown), and with the relatively low cost of entry-level dual socket boards it might be a very logical upgrade path to grab yourself a second low-end i7 chip and go for a NUMA xeon setup. I'll take a 16-thread NUMA configuration over a $1000 12-thread single-socket gulftown.
Leave a comment:

Announcement

Linux 2.6.38 Kernel Multi-Core Scaling

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: