1 year ago

#256465

test-img

iridiumcc

Mix and match GCC and Intel compilers: link OpenMP correctly

I have a scientific C++ application that is parallelized with OpenMP and compiled typically with GCC/8.2.0. The application further depends on gsl and fftw, the latter using OpenMP as well. The application uses a C API to access a Fortran library that is parallelized with OpenMP as well and can use either Intel's MKL or openblas as backend. Compilation of the library is preferred using the Intel/19.1.0 toolchain. I have successfully compiled, linked, and tested everything using GCC/8.2.0 and openblas (as base line). However, test studies on minimal examples suggest MKL with Intel would be faster and speed is important for my use case.

icc --version gives me: icc (ICC) 19.1.0.166 20191121; operating system is CentOS 7. Bear in mind I'm on a cluster and have limited control on what I can install. Software is centrally managed using spack and environments are loaded by specification of a compiler layer (only one at a time).

I have considered different approaches how to get the Intel/MKL library into my code:

  1. Compile C++ and Fortran code using the Intel toolchain. While that's probably the tidiest solution, the compiler throws "internal error: 20000_7001" for a particular file with a OMP include. I could not find documentation for that particular error code and have not gotten feedback from Intel either (https://community.intel.com/t5/Intel-C-Compiler/Compilation-error-internal-error-20000-7001/m-p/1365208#M39785). I allocated > 80 GB of memory for compilation as I have experienced the compiler break down before when limited resources were available. Maybe someone here has seen that error code?

  2. Compile C++ and Fortran code with GCC/8.2.0 but link dynamically to Intel compiled MKL as backend for the Fortran library. I managed to do that from the GCC/8.2.0 layer and extension of LIBRARY_PATH and LD_LIBRARY_PATH to where MKL lives on the cluster. It seems like only GNU OMP is linked and MKL was found. Analysis shows that CPU load is quite low (but higher than the binary with the GCC/8.2.0 + openblas set-up). Execution time of my program is improved by ~30%. However, I got this runtime error in at least one case when I run the binary with 20 cores: libgomp: Thread creation failed: Resource temporarily unavailable.

  3. Sticking with GCC/8.2.0 for my C++ code and linking dynamically to the precompiled Fortran library that was compiled itself with Intel/MKL using Intel OMP. This approach turned out to be tricky. As with approach (2), I loaded the GCC environment and manually expanded LD_LIBRARY_PATH. A minimal example that is not OMP parallelized itself worked beautifully out of the box. However, even though I managed to compile and link my C++ program as well, I got an immediate runtime error once the OMP call in the Fortran library occurs.

Here is the output of ldd of the compiled C++ code:

linux-vdso.so.1 => (0x00007fff2d7bb000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ab227c25000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002ab227e41000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002ab228337000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002ab228595000)
libz.so.1 => /lib64/libz.so.1 (0x00002ab228a36000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002ab228c4c000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002ab228e53000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002ab22a16d000)
libm.so.6 => /lib64/libm.so.6 (0x00002ab22a4f1000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002ab22a7f3000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002ab22aa21000)
libc.so.6 => /lib64/libc.so.6 (0x00002ab22ac39000)
/lib64/ld-linux-x86-64.so.2 (0x00002ab227a01000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002ab22b007000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002ab22bb73000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002ab22e0df000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002ab2323ff000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002ab232763000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002ab232d01000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002ab234688000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002ab2349f2000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002ab234de2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ab235059000)

I did some research and found interesting discussions here and at Intel's documentation regarding crashes with two different OMP implementations:

Telling GCC to *not* link libgomp so it links libiomp5 instead https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/openmp-support/openmp-library-support/using-the-openmp-libraries.html http://www.nacad.ufrj.br/online/intel/Documentation/en_US/compiler_c/main_cls/optaps/common/optaps_par_compat_libs_using.htm

I followed the guidelines provided for the Intel OpenMP compatibility libraries. Compilation of my C++ code was done from the GCC environment using the -fopenmp flag as always. During the linking stage (g++), I took the same linker command I usually take but replaced -fopenmp by -L/cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64 -liomp5 -lpthread. The resulting binary runs like a charm and is roughly twice as fast as my original built (GCC/openblas).

Here is the output of ldd of the compiled C++ code:

linux-vdso.so.1 =>  (0x00007ffd7eb9a000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002b4fb08da000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b4fb0cca000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002b4fb0ee6000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002b4fb13dc000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002b4fb163a000)
libz.so.1 => /lib64/libz.so.1 (0x00002b4fb1adb000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002b4fb1cf1000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002b4fb1ef8000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002b4fb3212000)
libm.so.6 => /lib64/libm.so.6 (0x00002b4fb3596000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002b4fb3898000)
libc.so.6 => /lib64/libc.so.6 (0x00002b4fb3ab0000)
/lib64/ld-linux-x86-64.so.2 (0x00002b4fb06b6000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b4fb3e7e000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002b4fb4082000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b4fb42b0000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b4fb4e1c000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002b4fb7388000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002b4fbb6a8000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002b4fbba0c000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002b4fbbfaa000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002b4fbd931000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002b4fbdc9b000)

Unlike in approach (2), the binary is linked against both libiomp5 and libgomp. I suspect that I get references to libgomp because I link to libfftw3_omp, which was compiled with GCC/8.2.0. I find it quite puzzing that ldd seems to give the exact same links as for my first attempt with approach (3), only the order seems to have changed (libiomp5 before libgomp).

While I am quite happy to have gotten a working binary in the end, I have some questions I could not resolve by myself:

  • do you interpret Intel's documentation and the previous SO post like I do and agree that the Intel OpenMP compatibility libraries are applicable in my case and that I have used the correct workflow? Or do you think approach (3) is a recipe for disaster in the future?

  • does any of you have more experience with Intel's C++ compiler and has seen the error code described in approach (1)? (see update below)

  • do you think it's worth investigating whether I can completely get rid of libgomp by, for example, manually linking to Intel compiled libfftw3_omp that only depends on libiomp5? (see update below)

  • do you have an explanation why thread creation fails in some cases using approach (2)?

Thank you very much in advance!

// Update: In the meantime I managed to tweak approach (3) by not linking against GCC/8.2.0 compiled gsl and fftw but used instead Intel/19.1.0 compiled gsl and fftw. The resulting binary is similar in speed compared to what I have gotten before, however, links only to libiomp5.so, which seems like the cleaner solution to me.

// Update: Manual exclusion of compiler optimizations for files that throw internal errors from CMakeLists.txt (CMake: how to disable optimization of a single *.cpp file?) gave me a working binary, however, with linker warnings.

c++

gcc

openmp

intel-mkl

icc

0 Answers

Your Answer

Accepted video resources