The thing that irks me is that x399 motherboards for threadripper are so expensive. You're looking at ~$340 minimum for a X399 motherboard and only ~$215 X299 Intel board. This makes getting Intel's $599 i7-7820X a better deal than the $549 7900X supposing you can make do with 28 pcie lanes instead of 60. Plus the fact that the i7-7820X performs better on average, has additional avx instructions, and uses less power. It seems that the 7900X should be priced a bit more competitively.
Announcement
Collapse
No announcement yet.
AMD Rolls Out The Threadripper 1900X: 16 Thread, 4.0GHz Boost
Collapse
X
-
Originally posted by schmidtbag View PostActually, in most cases, it isn't simple at all.
photo applications
coarse grained parallelism: perform a single operation on a set of images: parallel convert -resize 50% ::: *.jpg
fine grained parallelism: perform the operation on individual pixels, blocks of pixels, or for each layer.
audio
coarse grained parallelism: perform a single operation on a set of audio clips:: parallel flac ::: *.wav
fine grained parallelism: perform the operation on individual samples or blocks of samples
video
coarse grained parallelism: perform a single operation on a set of video clips: parallel encoder ::: *.files
fine grained parallelism: perform the operation on individual frames or blocks of frames: ffmpeg -threads
compilation
coarse grained parallelism: make -j16
fine grained parallelism: some modern threaded compiler hint: parsing is one of the most time consuming tasks and guess what, parsing is 100% embarrasingly parallel. Totally independent. Parsing is a totally pure function String -> AST.
disk i/o
coarse grained parallelism: one thread per device / operation
fine grained parallelism: linux encryption support, multithreaded scrub etc.
statistics, signal processing, ...
guess what, Python, R, Julia, and the other cool kids already use fortran optimized multithreaded vector code
web
coarse grained parallelism: one thread per tab/browser window
fine grained parallelism: web workers, threaded & vectorized backend libraries
Not just anything can easily be made multi-threaded, let alone with a performance improvement.
Games, for example, don't use more threads because doing so adds to latency (and therefore I/O lag). It's also a challenge because many objects in a game need direct access to each others variables. Usually, each thread's memory access is isolated, but if you want each thread to share memory access, you either lose a lot of performance or you dramatically increase RAM usage (or both), neither of which are desirable.
Code:g++ game.cpp -o game -O3 -fopenmp ./game > game.csv R --vanilla < game.r xdg-open game.png
Multi-threading is only practical when each thread can comfortably work independently and doesn't depend on the synchronicity of others.
GCC is only a single-threaded task, because multi-threading it would likely have diminishing returns or be waaay too complicated to set up. But by compiling multiple files into separate threads at the same time, you get to maximize CPU usage without slowing down anything, because [usually] none of the tasks depend on each other to complete.
- Likes 2
Comment
-
There is another factor about those 10 years articles, there is a lot more techniques to exploit Muti core performance than simply try to thread serial algorithms into oblivion:
1.) inner loop parallelization: OpenMP and OpenACC are great(and very easy) here and should be used when your algorithm have big enough dataset that can be isolated/instanced from each run inside the loop, for example this make wonders on (sparse)matrix mathematics since both tools can also generate vectorization on the background.
2.) outer loop independent execution: this is great when your inner loop is heavily tied and is not easy to parallelize without heavy synchronization but your datasets are not tied with each other before the loop operates on it, this should be used when you have read ahead blocks this way you can run each in block independently of each other while keeping the inner loop serial
3.) concurrency: this is the simplest one, simply have threads(or (side)async triggers) run each process that is independent of other processes, for example why stall the pipeline waiting for a unrelated matrix multiplication to insert data on a DB? why stall an unrelated read operation while waiting to finish an unrelated write operation? etc. <-- very heavily used in the enterprise business software.
4.) (on demand)(instanced) virtualized multi server concurrency model: why have 1 huge binary trying to do everything when you can segment the job into isolated (on demand) specialized instances?.
5.) Async coding: a cheaper(as less cycles penalty) version of 3 but doesn't guarantee it will run in another thread/core just guarantee it will not block or wait for another operation(is up to the OS) and it can be harder to synchronize if you don't know what you are doing.
6.) Async thread pools: great to trigger large operations on a single thread with certain guarantees that will not block or wait for the main(spawning) thread, can be very hard if you don't know what you are doing.
7.) Event programming: instead of having a myriad of retarded loops waiting and wasting cycles for any sort of input/event tell the OS to inform you when that input/event actually happens and spawn the correct code to process it, can be hard for beginners tho, things like lib/ev/etc can help a lot
8.) NUMA/Parallel aware allocations: sometimes how you allocate memory/cache can make a huge difference depending of the techniques and hardware available even on very serial operations.
9.) Batch processing: sometimes is hard to parallelize certain operations simply because they are too small to see any actual advantage/effort ratio but in many cases you don't need those result in real time, so if you can accumulate enough small operation on a buffer parallelization/concurrency can have a huge impact on the final result or at least improve usage in another department(for example is more efficient to insert 1000 rows in 1 operation than randomly execute insert 1000 times for most databases)
10.) adaptive multi level algorithm: some times is impossible to find a perfect optimized solution for a problem, I mean your algorithm may kill it on a quad core but run like trash on 8 or even worse exhibit exponential diminishing return or your code kills it when handling up to 1000 operations but at 10000 stop scaling and run like trash, so instead of find a miraculous way to be great at both simply use 2 different algorithms based on the load/hardware at runtime. For example videos use an algorithm named IDCT but depending how you intend process it you have 3 variants serial integer(most used by players because you simply won't need more), MxM integer and MxM float(both are massively parallel) that are used mostly by Post Processing tools(normally based on CUDA/OpenCL)
11.) Redesign your algorithms: this one is obvious of course, if you have a serial algorithm think a bit so you can find a more parallel way if possible.
and many more and obviously you can even mixed them depending your needs since there is no magic "Parallelization" formula or technique or compiler or tool. sure at first seems very complex but most decent developers this days use already some of those techniques/tools to some extent and even some higher level languages can use them implicitly in some cases as well tools have become very decent in those aspects, so is not 1% anymore.
Today the problem is more on Money return than actual difficulty at programming level, for example most games studio don't go parallel beyond the absolute minimal because is hard but because there was not enough market pressure to justify the investment, just think how many people would game on 4+ cores machine just 6 months ago(core I3/I5 are way more popular among gamers than the I7 are regardless of what any delusional intel fanboy may claim) and you will realize it made no economic sense for them to go beyond maxing out a regular quad core because people beyond that threshold are too few to cover the expense of that work. Now that both And and Intel are pushing more cores on all the market segments the situation will start to change.
Disclaimer: I didn't pick those names from wikipedia and I'm not a native English speaker either, so the name of the things I wrote here are translations of how I mostly remember them or learned them through the years in my native language, if you don't like it then post below the correct wikipedia terms or whatever but the actual functionality is there so.
- Likes 1
Comment
-
Originally posted by Espionage724 View PostWhat use is single-threaded performance nowadays? Aside from some (older?) games, I would assume most things are multi-threaded to some extent nowadays.
Multi-threaded or not, emulating anything recent will take all the single-threaded performance you can throw at it. (That's why I worry that I'll have to have two PCs next time I upgrade. One from the last generation of Opteron to not have a PSP and a quarantined Intel machine for gaming.)
- Likes 1
Comment
-
Originally posted by caligula View PostPlease define 'most cases'. How about iterating over all the applications you have, adding a cross in the checkbox if it scales to multiple cores. I'll start with few examples:
photo applications
coarse grained parallelism: perform a single operation on a set of images: parallel convert -resize 50% ::: *.jpg
fine grained parallelism: perform the operation on individual pixels, blocks of pixels, or for each layer.
audio
coarse grained parallelism: perform a single operation on a set of audio clips:: parallel flac ::: *.wav
fine grained parallelism: perform the operation on individual samples or blocks of samples
video
coarse grained parallelism: perform a single operation on a set of video clips: parallel encoder ::: *.files
fine grained parallelism: perform the operation on individual frames or blocks of frames: ffmpeg -threads
compilation
coarse grained parallelism: make -j16
fine grained parallelism: some modern threaded compiler hint: parsing is one of the most time consuming tasks and guess what, parsing is 100% embarrasingly parallel. Totally independent. Parsing is a totally pure function String -> AST.
disk i/o
coarse grained parallelism: one thread per device / operation
fine grained parallelism: linux encryption support, multithreaded scrub etc.
statistics, signal processing, ...
guess what, Python, R, Julia, and the other cool kids already use fortran optimized multithreaded vector code
web
coarse grained parallelism: one thread per tab/browser window
fine grained parallelism: web workers, threaded & vectorized backend libraries
Funny thing is, most tasks I do scale just fine. See the examples I listed above.
Bullshit, here's an example. Try this (code here https://pastebin.com/Rx6nAUnF)
Code:g++ game.cpp -o game -O3 -fopenmp ./game > game.csv R --vanilla < game.r xdg-open game.png
It depends. You have costs C_Seq (cost of the sequential computation of a task), C_Par (cost of the parallelized tasks), C_Syn (cost of synchronization. Parallel version is faster iff C_Par + C_Syn < C_Seq. You're almost claiming that this is never the case. Synchronization is pretty cheap with 1 to 8 threads. It just is. A modern CPU can do
Totally wrong. Modern compilers are multithreaded. Even most GCC projects use multiple threads via cmake/make/meson/pick your build system. It would always be faster to have the threading inside GCC.
In Practice if an application use at least 2 threads in "Multi Threaded" but it is efficient? maybe it is on single/dual core but probably is not in a 64 cores CPU
Can an application be efficient on a 64 cores CPU without threads? yes, absolutely. Non Threaded and Serial are not synonymous.
Threads are 1 tool in a whole arsenal of tools that can used to make your code exhibit "Efficient Scaling" be it parallel or serial(yes there is such thing as serial scaling).
As another note, there is no such thing as a purely parallel/threaded code as there is no such thing as purely serial code either, all code is a mix of both to some degree even if you in preferred language didn't explicitly did it, the question is if that current Ratio of serial and parallel operations are enough to be efficient on the target system.
Disclaimer: This is not an answer for you personally but for all involved in the discussion, I just picked this post since it was the closest
Comment
-
Originally posted by schmidtbag View PostActually, in most cases, it isn't simple at all. Not just anything can easily be made multi-threaded, let alone with a performance improvement. Games, for example, don't use more threads because doing so adds to latency (and therefore I/O lag). It's also a challenge because many objects in a game need direct access to each others variables. Usually, each thread's memory access is isolated, making it so objects can't read the variables of others. But if you want each thread to share memory access, you either lose a lot of performance or you dramatically increase RAM usage (or both), neither of which are desirable.
Multi-threading is only practical when each thread can comfortably work independently and doesn't depend on the synchronicity of others. Take software compiling for example: in reality, GCC is only a single-threaded task, because multi-threading it would likely have diminishing returns or be waaay too complicated to set up. But by compiling multiple files into separate threads at the same time, you get to maximize CPU usage without slowing down anything, because [usually] none of the tasks depend on each other to complete..
In regards to overhead of creating threads, that's not really relevant to my point. My point was that it's incredibly easy to write software that uses multiple threads. Yet there are ways of circumventing the overhead of threads, and that all comes down to the sort of architecture you've chosen to write your software with. Sure, a program that's written with a single thread in mind may be very difficult to parallelize, but I can guarantee you that there are ways to structure your software so that you can use a couple extra threads. Plenty of games have been released that have been able to take advantage of large amounts of cores, so it's not impossible. It's very doable.
To create software that takes full advantage of multiple threads, you need to take these points into consideration:- Don't choose a programming language that requires a runtime. That includes Go. Greatly minimizes scenarios where you can use multiple threads.
- Instead of spawning threads ad-hoc, use a thread pool. A number of ways to do that. My rayon example does that, and this can be done with futures.
- I/O tasks can easily be made async, which doesn't necessarily mean multiple threads, but it is another form of executing code simultaneously.
- Try to reach for atomics first, then rwlocks, then mutexes, and finally channels.
- On *nix systems, file descriptors can be a useful form of communication between parent and child. See the pipe & dup2 syscalls.
Comment
-
Originally posted by ssokolow View Post
I've got an example for you: Emulation.
Multi-threaded or not, emulating anything recent will take all the single-threaded performance you can throw at it. (That's why I worry that I'll have to have two PCs next time I upgrade. One from the last generation of Opteron to not have a PSP and a quarantined Intel machine for gaming.)
Comment
-
Originally posted by caligula View PostPlease define 'most cases'. How about iterating over all the applications you have, adding a cross in the checkbox if it scales to multiple cores. I'll start with few examples:
Funny thing is, most tasks I do scale just fine. See the examples I listed above.
Bullshit, here's an example. Try this (code here https://pastebin.com/Rx6nAUnF)
Code:g++ game.cpp -o game -O3 -fopenmp ./game > game.csv R --vanilla < game.r xdg-open game.png
If you're so certain you're right, explain to me why no studios are doing things the way you claim? Explain to me why even to this day, 4 cores is still the method to play modern games.
It depends. You have costs C_Seq (cost of the sequential computation of a task), C_Par (cost of the parallelized tasks), C_Syn (cost of synchronization. Parallel version is faster iff C_Par + C_Syn < C_Seq. You're almost claiming that this is never the case. Synchronization is pretty cheap with 1 to 8 threads. It just is. A modern CPU can do
Totally wrong. Modern compilers are multithreaded. Even most GCC projects use multiple threads via cmake/make/meson/pick your build system. It would always be faster to have the threading inside GCC.Last edited by schmidtbag; 31 August 2017, 01:15 PM.
Comment
-
Originally posted by mmstick View Post
You're speaking to the choir. I have a lot of experience writing multi-threaded and multi-process code using low level threading primitives like atomics and rwlocks, in addition to mutexes and higher-level channel-based approaches; along with *nix forks, FD redirections, signal handling and job control as creator of the Ion shell. It still stands that writing multi-threaded software is incredibly simple today -- especially if you are writing your multi-threaded software in Rust, which allows you to fiddle with lifetimes and mutability of your references across thread boundaries and defining thread-ability with the Send+Sync traits, all to ensure that your multi-threaded solution is safe at compile-time. You would be surprised at how many opportunities to use threads have been completely overlooked.
In regards to overhead of creating threads, that's not really relevant to my point. My point was that it's incredibly easy to write software that uses multiple threads. Yet there are ways of circumventing the overhead of threads, and that all comes down to the sort of architecture you've chosen to write your software with. Sure, a program that's written with a single thread in mind may be very difficult to parallelize, but I can guarantee you that there are ways to structure your software so that you can use a couple extra threads. Plenty of games have been released that have been able to take advantage of large amounts of cores, so it's not impossible. It's very doable.
To create software that takes full advantage of multiple threads, you need to take these points into consideration:- Don't choose a programming language that requires a runtime. That includes Go. Greatly minimizes scenarios where you can use multiple threads.
- Instead of spawning threads ad-hoc, use a thread pool. A number of ways to do that. My rayon example does that, and this can be done with futures.
- I/O tasks can easily be made async, which doesn't necessarily mean multiple threads, but it is another form of executing code simultaneously.
- Try to reach for atomics first, then rwlocks, then mutexes, and finally channels.
- On *nix systems, file descriptors can be a useful form of communication between parent and child. See the pipe & dup2 syscalls.
Btw in Linux you can use since 3.17 Kyle Sievers memory FDs, I've been testing them and they help tons when doing zero copy or even certain type of buffers arrangements where cleaning is hard since all the actual paging, cleaning, etc. is handled kernel side and the seal operator is kinda great as well. very very handy. Not sure if rust can use them tho since I'm mostly a C/C++ guy and I hate too much the rust syntax to bother(I'm too old and too used to C/C++ but I'm not saying is bad or anything, is just the syntax that make my eyes bleed)
Comment
Comment