1 year ago

#373612

test-img

LucasVaz97

Spark Medium of values as column

I just starting to work with Spark and I have to create a column with values based on another data frame values. My first data frame has an Id and start date columns while my other one has a yield value,acquired date and Id. I have to create a new column in the first data frame with the mean of the available values from the last 30 days of the start date with the yield values from the other data frame. So the output should look something like this:

Table 1

ID  start_date
1    01/12/2018
2    01/11/2019

Table 2

ID   yield acquired_date
1    120   05/11/2019
1    100   05/11/2018
1    200   07/11/2018
1    200   08/11/2018
2    350   04/10/2020
2    300   04/10/2019
2    100   05/10/2019

output

ID  start_date  yield_mean
1    01/12/2018   250
2    01/11/2019   200

Note: the mean only accounts for values where acquired date is 30 days less than start date so row 0 and row 4 are not used.

dataframe

apache-spark

pyspark

mean

0 Answers

Your Answer

Accepted video resources