Announcement

**paulpach** · 07 August 2017, 10:17 AM

Would this help single threaded apps as well? If you do a malloc in a single thread it was still doing locking no? There would be no contention on any lock, but locking and unlocking is not free even without contention. I suppose it depends on the type of lock being used?

**arjan_intel** · 07 August 2017, 02:01 PM

Originally posted by paulpach View Post

Would this help single threaded apps as well? If you do a malloc in a single thread it was still doing locking no? There would be no contention on any lock, but locking and unlocking is not free even without contention. I suppose it depends on the type of lock being used?

glibc is pretty smart, and for quite a few things (iirc including malloc) it will not do locking until the first pthread_create() is called

also locking itself is not that expensive on modern-ish CPUs. it's the cache bounces from using shared datastructures that tend to really kill you.
(not saying locking is free, it's not. but it's also not insane expensive unless there is massive contention)

**carewolf** · 08 August 2017, 04:32 AM

Originally posted by caligula View Post

When you compare a Ryzen system with 64 GB of RAM with a typical ARMv7 dev board with 256 to 512 MB of RAM, how would you describe the level of threading on both systems?

Usually unnecessary in the first, and essential in the latter.

**linuxgeex** · 08 August 2017, 04:52 AM

Originally posted by paulpach View Post

Would this help single threaded apps as well? If you do a malloc in a single thread it was still doing locking no? There would be no contention on any lock, but locking and unlocking is not free even without contention. I suppose it depends on the type of lock being used?

In a single threaded app there's a heap for each thread, the lock for that heap is always acquired, so no, it won't help.

**linuxgeex** · 08 August 2017, 04:58 AM

Originally posted by caligula View Post

When you compare a Ryzen system with 64 GB of RAM with a typical ARMv7 dev board with 256 to 512 MB of RAM, how would you describe the level of threading on both systems?

The mistake you're making is assuming that the Ryzen will be purchased to run exclusively heavily threaded apps. Except that the Ryzen is a consumer CPU and will most likely spend 95% of the day running 1 thread or less, and its owner will be pleased because the fan will be silent.

If you want to make your point, then you need to compare the embedded system to another system which is even more underpowered for its workload and which has even more threads. Such as a Power8 server, or even worse - a z-Series server running SAP.

**caligula** · 08 August 2017, 06:32 AM

Originally posted by linuxgeex View Post

The mistake you're making is assuming that the Ryzen will be purchased to run exclusively heavily threaded apps. Except that the Ryzen is a consumer CPU and will most likely spend 95% of the day running 1 thread or less, and its owner will be pleased because the fan will be silent.

If you want to make your point, then you need to compare the embedded system to another system which is even more underpowered for its workload and which has even more threads. Such as a Power8 server, or even worse - a z-Series server running SAP.

That's pure BS. Normal desktop systems run hundreds of threads. You can't even launch Firefox with just one thread.

**linuxgeex** · 09 August 2017, 07:55 AM

You asked:

Originally posted by caligula View Post

When you compare a Ryzen system with 64 GB of RAM with a typical ARMv7 dev board with 256 to 512 MB of RAM, how would you describe the level of threading on both systems?

Threading means different things to different people.

A massively multithreaded app which creates 1000 threads, and then those threads sit there doing nothing, is not heavily multithreaded. Good examples of that are Java multithreaded apps which don't leverage pthreads, such as Limewire. It runs a thread for each connection, but the Java interpreter itself only ran one CPU thread. Or Google Chrome, which can have hundreds of threads if you have 50 tabs open, but only 4 of those threads will be using any significant amount of the CPU at any time. That's not "heavy" threading. And that's a good thing. It would be irresponsible of Chrome to burn your CPU for background tabs - your laptop battery would be toast for no reason.

Originally posted by caligula

That's pure BS. Normal desktop systems run hundreds of threads. You can't even launch Firefox with just one thread.

Firefox is an especially poor example of a "heavily threaded" app. Firefox can't keep a quad-core busy, even when you are starting it up with 50 tabs open. Why? Because it's a primarily single-threaded app which is in the middle of a multi-year project called Electrolysis to help it become properly multithreaded. However today most of Firefox still runs in a single "main" thread.

To be clear: heavily multithreaded doesn't mean a large number of threads. It means a large number of threads which are always busy. A heavily multithreaded app will benefit from dozens of cores. There's nothing "heavy" about idling. If the workload is so light that a uni-processor system can complete all tasks and still idle before the end of the majority of its time-slices, and the scheduler never has to skip a task to provide fair scheduling, then you don't even need an SMP system and multithreaded apps will not improve your performance, at all (they may even degrade both your performance and power efficiency.) In a performance-constrained environment like embedded/mobile, the same amount of work will need to be done on more threads in parallel in order for it to be completed in the same amount of time vs a high-performance desktop.

That is the question you asked. That is the honest answer. No "BS".

**caligula** · 09 August 2017, 04:38 PM

Originally posted by linuxgeex View Post

In a performance-constrained environment like embedded/mobile, the same amount of work will need to be done on more threads in parallel in order for it to be completed in the same amount of time vs a high-performance desktop.

I find it hard to believe this is the case. Yes, you'll need ways to deal with the lack of processing power, but I don't think many embedded platforms really need this much power.

**torsionbar28** · 12 August 2017, 01:27 PM

[QUOTE=caligula;n968641]

Originally posted by linuxgeex View Post

In a performance-constrained environment like embedded/mobile, the same amount of work will need to be done on more threads in parallel in order for it to be completed in the same amount of time vs a high-performance desktop./QUOTE]

I find it hard to believe this is the case. Yes, you'll need ways to deal with the lack of processing power, but I don't think many embedded platforms really need this much power.

1080p video editing on phones and tablets is a thing, and even 4k video editing on the newest iPad is a thing. Mobile is a particularly good example, because it requires some real-time attributes to handle the voice & text processes, and for several years now, apps stay running in the background rather than being opened/closed each time you launch them.

Not sure of the inner workings, but the newest Cisco Nexus switches run a Linux kernel, and are switching terabits per second across dozens or hundreds of ports. I imagine this requires some significant parallel processing.

**caligula** · 13 August 2017, 11:22 AM

Originally posted by torsionbar28 View Post

1080p video editing on phones and tablets is a thing, and even 4k video editing on the newest iPad is a thing.

That's a terrible example because all mobile platforms use dedicated DSP for encoding and even for decoding the video. It's in no way related to CPU power or threads.

Originally posted by torsionbar28 View Post

Not sure of the inner workings, but the newest Cisco Nexus switches run a Linux kernel, and are switching terabits per second across dozens or hundreds of ports. I imagine this requires some significant parallel processing.

The only problem here is, the network ops are accelerated. Sure it might still use kthreads, but the domain of high power processing is very limited here.

Announcement

Glibc's Per-Thread Cache Is Helping Out Some Benchmarks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment