1 year ago

#385847

test-img

Liang Ce

How could I make my code work parallelize with dask?

First import some packages:

import numpy as np
from dask import delayed

Suppose I have two NumPy arrays:

a1 = np.ones(5000000)
a2 = np.ones(8000000)

I would like to show the sum and length of the two arrays, and the functions are shown as:

def sum(x):
  result = 0
  for data in x:
      result = result + data
  return result, len(x)

def get_result(x, y):
  return x, y

I have two examples in colab, the sequential example is like this:

%%time
result1 = sum(a1)
result2 = sum(a2)
result = get_result(result1, result2)
print(result)

And the output is:

((5000000.0, 5000000), (8000000.0, 8000000))
CPU times: user 1.41 s, sys: 3.7 ms, total: 1.42 s
Wall time: 1.42 s

However, I would like to compute these values parallelly.

result1 = delayed(sum)(a1)
result2 = delayed(sum)(a2)
result = delayed(get_result)(result1, result2)
result = result.compute()
print(result)

And the output is:

Delayed('get_result-ffbb6330-1014-42c5-b625-06e3e66a56ed')
CPU times: user 1.42 s, sys: 7.97 ms, total: 1.42 s
Wall time: 1.43 s

Why the second program didn't work parallelly? Because the wall time two examples are almost the same.

python

numpy

dask-delayed

0 Answers

Your Answer

Accepted video resources