1 year ago
#385847
Liang Ce
How could I make my code work parallelize with dask?
First import some packages:
import numpy as np
from dask import delayed
Suppose I have two NumPy arrays:
a1 = np.ones(5000000)
a2 = np.ones(8000000)
I would like to show the sum and length of the two arrays, and the functions are shown as:
def sum(x):
result = 0
for data in x:
result = result + data
return result, len(x)
def get_result(x, y):
return x, y
I have two examples in colab, the sequential example is like this:
%%time
result1 = sum(a1)
result2 = sum(a2)
result = get_result(result1, result2)
print(result)
And the output is:
((5000000.0, 5000000), (8000000.0, 8000000))
CPU times: user 1.41 s, sys: 3.7 ms, total: 1.42 s
Wall time: 1.42 s
However, I would like to compute these values parallelly.
result1 = delayed(sum)(a1)
result2 = delayed(sum)(a2)
result = delayed(get_result)(result1, result2)
result = result.compute()
print(result)
And the output is:
Delayed('get_result-ffbb6330-1014-42c5-b625-06e3e66a56ed')
CPU times: user 1.42 s, sys: 7.97 ms, total: 1.42 s
Wall time: 1.43 s
Why the second program didn't work parallelly? Because the wall time two examples are almost the same.
python
numpy
dask-delayed
0 Answers
Your Answer