1 year ago

#370206

test-img

Bhaskar Sarma

performance difference between TFLite python and C++ API

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 10 Linux buster

TensorFlow installed from (source or binary): 2.8.0

TensorFlow version (use command below): 2.8.0

Python version: 3.7.3

Bazel version (if compiling from source): 4.2.1

GCC/Compiler version (if compiling from source): 8.3.0

CUDA/cuDNN version: NA

GPU model and memory: NA

TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

v2.8.0-rc1-32-g3f878cff5b6 2.8.0

Describe the current behavior

Python performance of tflite is much better than C++.

When number of threads is set to -1, Not getting best performance in C++.

Manual setting the number of threads to max is giving improvement in C++ API performance and still it is very lower than python.

As per this github tensorflow issue(#46272) It is mentioned,when number of threads in c++ are set to -1, all threads will be used, But its not happening and there is performance difference.

Performance is not modified proportionately based on the threads. Suppose when threads are set to 2, we are not getting 2x performance than threads as 1

Describe the expected behavior :

Match the performance of python with C++.

Give an API or directly automate the setting the threads without manual change.

What is the backend used for python and C++ ? Are they same ?

Can we expect the performance proportionately based on the threads. Suppose when threads are set to 2, can we expect 2x performance than 1 thread ?

Standalone code to reproduce the issue :

Python TFlite--

python3 tfliteversionprofile_latest_singleiteration.py

2.8.0

Time elapsed during the process:%d ms 99.971158

python3 tfliteversionprofile_latest_singleiteration_multicores.py

.2.8.0

Time elapsed during the process:%d ms 85.076159

There is clear change in performance when number of threads are set to max.

My cpu has 6 cores and 2 threads per core, so set to 12.

C++ TFlite--

But when c++ API is used there is huge impact in performance , Using example label image.

when -1 is set as number of threads.

bazel-4.2.1 build -c opt //tensorflow/lite/examples/label_image:label_image

when -1 (number of threads) is set in label_image.h present in tensorflow/lite/examples/label_image,

bazel-bin/tensorflow/lite/examples/label_image/label_image --tflite_model detect.tflite --labels labelmap.txt --image tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp

INFO: Loaded model detect.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

INFO: invoked

INFO: average time: 323.143 ms

INFO: 0.00389769: 3 car

INFO: 0.0038741: 2 bicycle

when 12 is set as number of threads.

bazel-4.2.1 build -c opt //tensorflow/lite/examples/label_image:label_image

when 12 (number of threads) is set in label_image.h present in tensorflow/lite/examples/label_image,

bazel-bin/tensorflow/lite/examples/label_image/label_image --tflite_model detect.tflite --labels labelmap.txt --image tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp

INFO: Loaded model detect.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

INFO: invoked

INFO: average time: 141.746 ms

INFO: 0.00389769: 3 car

INFO: 0.0038741: 2 bicycle

Files are attached in this link files

python

c++

multithreading

tensorflow

tensorflow-lite

0 Answers

Your Answer

Accepted video resources