1 year ago
#383795
damaBeugXam
Cross compiling FFTW for ARM Neon
I am trying to compile FFTW3 to run on ARM Neon (More precisely, on a Cortex a-53). The build env is x86_64-pokysdk-lunix, The host env is aarch64-poky-lunix. I am using the aarch64-poky-linux-gcc compiler. I used the following command at first:
./configure --prefix=/build_env/neon/neon_install_8 --host=aarch64-poky-linux --enable-shared --enable-single --enable-neon --with-sysroot=/opt/poky/2.5.3/sysroots/aarch64-poky-linux "CC=/opt/poky/2.5.3/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gcc -march=armv8-a+simd -mcpu=cortex-a53 -mfloat-abi=softfp -mfpu=neon"
The compiler did not support the -mfloat-abi=softfp
and the -mfpu=neon
. It also did not let me define the path to the sysroot this way.
Then used the following command:
./configure --prefix=/build_env/neon/neon_install_8 --host=aarch64-poky-linux --enable-shared --enable-single --enable-neon "CC=/opt/poky/2.5.3/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gcc" "CFLAGS=--sysroot=/opt/poky/2.5.3/sysroots/aarch64-poky-linux -mcpu=cortex-a53 -march=armv8-a+simd"
This command succeeded with this config log and this config.h. Then I used the command make
then make install
. I then copied my shared library file into my host env and used fftwf_
instead of fftw_
in my code base. The final step was to recompile the program. I ran a test and compared the times for both algorithm using <sys/resource.h>
. I also used the fftw[f]_forget_wisdom()
on both algorithms so that It can be fair. However, I am not getting a speedup. I believe that using an SIMD architecture (NEON
in our case) would accelerate the FFTW library.
I would really appreciate if anyone can point out something that I am doing wrong so that I can try a fix and see if I can get the performance boost I am looking for.
gcc
arm64
fftw
0 Answers
Your Answer