No announcement yet.

Slowdown when computing in parallel on multicore CPU

  • Filter
  • Time
  • Show
Clear All
new posts

  • Slowdown when computing in parallel on multicore CPU

    A couple of months ago I observed some unusual behaviour when I computed integer arithmetic one a single core, then on two cores, then on three cores, and so on. I own an octacore processor (AMD FX(tm)-8120, 3.1GHz). I would have assumed that as long as I have enough cores idling that adding more work to my CPU does not adversely affect performance per task. But quite the contrary, performance goes down, up to 60%. Right now I have no convincing explanation for this behaviour.

    The same is true for Intel i7 (i7-2640, 2.8GHz). Performance deteriorates up to the point when all cores are used. At that moment no further slowdown can be observed.

    The same is also true for floating point calculation on both processors (AMD and Intel). The problem at hand was completely artificial, and there seems to be no memory contention, as all values at hand can easily be kept within the CPU entirely. See

    Thank you for any comments on this.

  • #2
    I think what you see here is the process hoping from one core to another when the others are idle. This causes huge slow downs. This does not happen when all cores are busy -> exactly what you described.
    IIRC linux 3.8 and 3.9-rcX include some optimizations to reduce this core jumping, but you can try to manually intervene and use 'taskset' to set the processes cpu affinity ( )

    Last edited by droste; 18 March 2013, 06:17 PM.


    • #3
      Thanks droste. That's an interesting aspect which I haven't thought of.

      I tried
      for i in `seq 1 6`; do echo 2 -1 0 -2 | time -f "%e %U %S" taskset -c $i ./intpoly -n0 & done
      but this didn't show any difference in execution time in comparison to the same without taskset. So the basic problem that CPU time per process increases the more processes you have, is still the same.


      • #4
        I just tried it on my PC (i5 with 4 cores, no HT) on Linux 3.9-rc2:

        $ for i in `seq 0 3`; do echo 2 -1 0 -2 | time -f "%e %U %S" taskset -a -c `expr $i % 4` ./intpoly -n0 & done

        $ for i in `seq 0 3`; do echo 2 -1 0 -2 | time -f "%e %U %S" ./intpoly -n0 & done
        And the results are basically the same. But I always get the (nearly) same execution time per process no matter if I start 1, 2, 3 or 4 processes and for more processes than cores the execution time increases almost linear to the number of new processes. 1 process -> ~3sec, 4 processes -> ~3sec, 40 processes -> ~30 sec. user time and sys time stay the same.

        So I can't rebuild your original results.