1 year ago
#389004
Kaiyakha
OpenMP thread spawn overhead
To learn more about thread spawn overhead in OpenMP
, I've experimented with an empty nested loop with the inner loop parellelized and compared it to the same nested loop without multiprocessing.
time_t time = std::time(nullptr);
for (long long i = 0; i < 1e8; i++)
#pragma omp parallel for
for (long long j = 0; j < (long long)1e2; j++) {}
std::cout << std::time(nullptr) - time << std::endl;
In this particular case, it took 79 seconds to run through the loop and 14 seconds to run through the same loop without the inner loop being paralellized (just commented out the directive). The bigger the inner parallel loop and the less the outer loop, the less time it takes to execute the code. Thus, with the outer loop consisting of 1e7
cycles and the inner one of 1e3
cycles, the time to execute the parallelized and the sequential versions of the program is comparable. With the outer loop with 1e6
cycles and the inner one with 1e4
the parallel version gets executed in 4 seconds. For the sequential version, the execution time always stayed the same no matter what the ratio between the amount of cycles was.
I haven't managed to find anything describing process creation overhead in detail. Still, this experiment shows this overhead can be really significant.
What are the means to avoid or minimise such overhead apart from moving the parallel section to the outer loop? Are there some workarounds provided by OpenMP
or maybe any other sources?
performance
multiprocessing
openmp
spawn
overhead
0 Answers
Your Answer