Originally posted by schmidtbag
View Post
$ time ls
real 0m0,001s
user 0m0,000s
sys 0m0,001s
So we can just forget all the trivial tasks that are already fast enough.
Another category of tasks can be found in interactive applications. GUI tasks are real-time, asynchronous, and event based. In GUIs it is important to optimize the time between an spontantenous event and the reaction. Sometimes this fast reaction time decreases performance. This can be done even if the machine has only one processor core. This is still a problem in some applications. For example, claws mail can't even shut down when its fetching mail. Isn't that great.
So for example, when you compile a program, each core is building their own C file simultaneously; they don't depend on each other. Meanwhile, when you are doing raytracing, it is important that each thread knows what the other thread is doing in order to remain accurate.
Each thread in a CPU can handle any task that is scheduled to it at any time, whereas to my understanding, a GPU must process a "single" parallelized task at a time throughout each of its cores per clock. What this means is if you are running a parallelized task, it is very difficult (and sometimes impossible, in the case of Hyper Threading) to synchronize each CPU thread, which results in wasted clock cycles.
The main difference is that GPUs are still more optimized for data parallel whereas CPUs are good at running heavy weight tasks. CPUs work just fine when you have 2-32 threads and cores, but the synchronization becomes more expensive as more threads need to participate. CPUs can emulate the GPU workloads but of course they don't perform as well since GPU workloads don't need huge caches or branch prediction and other useless single threaded optimizations..
This is a great example of how to properly take advantage of the threads in a CPU. Some people may say "why can't lame, oggenc, or flac use all the available cores?" but I don't think they should, because in some cases it could actually slow down the output while adding a lot of unnecessary complexity.
Comment