Would this help single threaded apps as well? If you do a malloc in a single thread it was still doing locking no? There would be no contention on any lock, but locking and unlocking is not free even without contention. I suppose it depends on the type of lock being used?
Announcement
Collapse
No announcement yet.
Glibc's Per-Thread Cache Is Helping Out Some Benchmarks
Collapse
X
-
Originally posted by paulpach View PostWould this help single threaded apps as well? If you do a malloc in a single thread it was still doing locking no? There would be no contention on any lock, but locking and unlocking is not free even without contention. I suppose it depends on the type of lock being used?
also locking itself is not that expensive on modern-ish CPUs. it's the cache bounces from using shared datastructures that tend to really kill you.
(not saying locking is free, it's not. but it's also not insane expensive unless there is massive contention)
- Likes 1
Comment
-
-
Originally posted by paulpach View PostWould this help single threaded apps as well? If you do a malloc in a single thread it was still doing locking no? There would be no contention on any lock, but locking and unlocking is not free even without contention. I suppose it depends on the type of lock being used?
Comment
-
Originally posted by caligula View Post
When you compare a Ryzen system with 64 GB of RAM with a typical ARMv7 dev board with 256 to 512 MB of RAM, how would you describe the level of threading on both systems?
If you want to make your point, then you need to compare the embedded system to another system which is even more underpowered for its workload and which has even more threads. Such as a Power8 server, or even worse - a z-Series server running SAP.
Comment
-
Originally posted by linuxgeex View Post
The mistake you're making is assuming that the Ryzen will be purchased to run exclusively heavily threaded apps. Except that the Ryzen is a consumer CPU and will most likely spend 95% of the day running 1 thread or less, and its owner will be pleased because the fan will be silent.
If you want to make your point, then you need to compare the embedded system to another system which is even more underpowered for its workload and which has even more threads. Such as a Power8 server, or even worse - a z-Series server running SAP.
Comment
-
You asked:
Originally posted by caligula View Post
When you compare a Ryzen system with 64 GB of RAM with a typical ARMv7 dev board with 256 to 512 MB of RAM, how would you describe the level of threading on both systems?
A massively multithreaded app which creates 1000 threads, and then those threads sit there doing nothing, is not heavily multithreaded. Good examples of that are Java multithreaded apps which don't leverage pthreads, such as Limewire. It runs a thread for each connection, but the Java interpreter itself only ran one CPU thread. Or Google Chrome, which can have hundreds of threads if you have 50 tabs open, but only 4 of those threads will be using any significant amount of the CPU at any time. That's not "heavy" threading. And that's a good thing. It would be irresponsible of Chrome to burn your CPU for background tabs - your laptop battery would be toast for no reason.
Originally posted by caligulaThat's pure BS. Normal desktop systems run hundreds of threads. You can't even launch Firefox with just one thread.
To be clear: heavily multithreaded doesn't mean a large number of threads. It means a large number of threads which are always busy. A heavily multithreaded app will benefit from dozens of cores. There's nothing "heavy" about idling. If the workload is so light that a uni-processor system can complete all tasks and still idle before the end of the majority of its time-slices, and the scheduler never has to skip a task to provide fair scheduling, then you don't even need an SMP system and multithreaded apps will not improve your performance, at all (they may even degrade both your performance and power efficiency.) In a performance-constrained environment like embedded/mobile, the same amount of work will need to be done on more threads in parallel in order for it to be completed in the same amount of time vs a high-performance desktop.
That is the question you asked. That is the honest answer. No "BS".Last edited by linuxgeex; 09 August 2017, 08:33 AM.
Comment
-
Originally posted by linuxgeex View PostIn a performance-constrained environment like embedded/mobile, the same amount of work will need to be done on more threads in parallel in order for it to be completed in the same amount of time vs a high-performance desktop.Last edited by caligula; 13 August 2017, 11:23 AM.
Comment
-
[QUOTE=caligula;n968641]Originally posted by linuxgeex View PostIn a performance-constrained environment like embedded/mobile, the same amount of work will need to be done on more threads in parallel in order for it to be completed in the same amount of time vs a high-performance desktop./QUOTE]
I find it hard to believe this is the case. Yes, you'll need ways to deal with the lack of processing power, but I don't think many embedded platforms really need this much power.
Not sure of the inner workings, but the newest Cisco Nexus switches run a Linux kernel, and are switching terabits per second across dozens or hundreds of ports. I imagine this requires some significant parallel processing.
Comment
-
Originally posted by torsionbar28 View Post
1080p video editing on phones and tablets is a thing, and even 4k video editing on the newest iPad is a thing.
Originally posted by torsionbar28 View PostNot sure of the inner workings, but the newest Cisco Nexus switches run a Linux kernel, and are switching terabits per second across dozens or hundreds of ports. I imagine this requires some significant parallel processing.
Comment
Comment