1 year ago

#389004

test-img

Kaiyakha

OpenMP thread spawn overhead

To learn more about thread spawn overhead in OpenMP, I've experimented with an empty nested loop with the inner loop parellelized and compared it to the same nested loop without multiprocessing.

time_t time = std::time(nullptr);
for (long long i = 0; i < 1e8; i++)
#pragma omp parallel for
    for (long long j = 0; j < (long long)1e2; j++) {}
std::cout << std::time(nullptr) - time << std::endl;

In this particular case, it took 79 seconds to run through the loop and 14 seconds to run through the same loop without the inner loop being paralellized (just commented out the directive). The bigger the inner parallel loop and the less the outer loop, the less time it takes to execute the code. Thus, with the outer loop consisting of 1e7 cycles and the inner one of 1e3 cycles, the time to execute the parallelized and the sequential versions of the program is comparable. With the outer loop with 1e6 cycles and the inner one with 1e4 the parallel version gets executed in 4 seconds. For the sequential version, the execution time always stayed the same no matter what the ratio between the amount of cycles was.

I haven't managed to find anything describing process creation overhead in detail. Still, this experiment shows this overhead can be really significant.

What are the means to avoid or minimise such overhead apart from moving the parallel section to the outer loop? Are there some workarounds provided by OpenMP or maybe any other sources?

performance

multiprocessing

openmp

spawn

overhead

0 Answers

Your Answer

Accepted video resources