1 year ago

#379174

test-img

user3448011

TensorFlow 2 keras model training very slow on CPU and most cpu cores (>95% cores) are idle

I am trying to train a neural network model (TensorFlow 2.8) on a CPU (EC2 instance m4.10 with about 160GB and 40 CPU cores) from the Jupyter notebook. The training data is loaded from 300+ gzip files (each file is 200+ MB) and processed as a dataset. But, the training process is very slow. It cost 75 mins per epoch. The code:

 import tensorflow.keras as keras
 tf.comfig.run_function_eagerly(True)
 model.compile(optimizer=Adam(learning_rate), loss=keras.BinaryCrossEntropy),
               run_eagerly=True,
               metrics=[keras.metrics.BinaryAccuracy()])

 model.fit(train_data, epoch=10, steps_per_epoch=1000, validation_steps=100, 
          workers=16, use_muiltiprocessing=True)

When the model is being trained, only 1 or 2 CPU cores are busy and all other 38 cores are idle.

I have tried eager=False, but no use. I have checked some posts about why tf2 is slower than tf1, but, none of them talk about why most CPU cores are idle.

Please let me know what I missed here?

tensorflow

machine-learning

keras

tensorflow2.0

cpu

0 Answers

Your Answer

Accepted video resources