1 year ago
#370206
Bhaskar Sarma
performance difference between TFLite python and C++ API
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 10 Linux buster
TensorFlow installed from (source or binary): 2.8.0
TensorFlow version (use command below): 2.8.0
Python version: 3.7.3
Bazel version (if compiling from source): 4.2.1
GCC/Compiler version (if compiling from source): 8.3.0
CUDA/cuDNN version: NA
GPU model and memory: NA
TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
v2.8.0-rc1-32-g3f878cff5b6 2.8.0
Describe the current behavior
Python performance of tflite is much better than C++.
When number of threads is set to -1, Not getting best performance in C++.
Manual setting the number of threads to max is giving improvement in C++ API performance and still it is very lower than python.
As per this github tensorflow issue(#46272) It is mentioned,when number of threads in c++ are set to -1, all threads will be used, But its not happening and there is performance difference.
Performance is not modified proportionately based on the threads. Suppose when threads are set to 2, we are not getting 2x performance than threads as 1
Describe the expected behavior :
Match the performance of python with C++.
Give an API or directly automate the setting the threads without manual change.
What is the backend used for python and C++ ? Are they same ?
Can we expect the performance proportionately based on the threads. Suppose when threads are set to 2, can we expect 2x performance than 1 thread ?
Standalone code to reproduce the issue :
Python TFlite--
python3 tfliteversionprofile_latest_singleiteration.py
2.8.0
Time elapsed during the process:%d ms 99.971158
python3 tfliteversionprofile_latest_singleiteration_multicores.py
.2.8.0
Time elapsed during the process:%d ms 85.076159
There is clear change in performance when number of threads are set to max.
My cpu has 6 cores and 2 threads per core, so set to 12.
C++ TFlite--
But when c++ API is used there is huge impact in performance , Using example label image.
when -1 is set as number of threads.
bazel-4.2.1 build -c opt //tensorflow/lite/examples/label_image:label_image
when -1 (number of threads) is set in label_image.h present in tensorflow/lite/examples/label_image,
bazel-bin/tensorflow/lite/examples/label_image/label_image --tflite_model detect.tflite --labels labelmap.txt --image tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp
INFO: Loaded model detect.tflite
INFO: resolved reporter
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: invoked
INFO: average time: 323.143 ms
INFO: 0.00389769: 3 car
INFO: 0.0038741: 2 bicycle
when 12 is set as number of threads.
bazel-4.2.1 build -c opt //tensorflow/lite/examples/label_image:label_image
when 12 (number of threads) is set in label_image.h present in tensorflow/lite/examples/label_image,
bazel-bin/tensorflow/lite/examples/label_image/label_image --tflite_model detect.tflite --labels labelmap.txt --image tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp
INFO: Loaded model detect.tflite
INFO: resolved reporter
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: invoked
INFO: average time: 141.746 ms
INFO: 0.00389769: 3 car
INFO: 0.0038741: 2 bicycle
Files are attached in this link files
python
c++
multithreading
tensorflow
tensorflow-lite
0 Answers
Your Answer