1 year ago
#373612
LucasVaz97
Spark Medium of values as column
I just starting to work with Spark and I have to create a column with values based on another data frame values. My first data frame has an Id and start date columns while my other one has a yield value,acquired date and Id. I have to create a new column in the first data frame with the mean of the available values from the last 30 days of the start date with the yield values from the other data frame. So the output should look something like this:
Table 1
ID start_date
1 01/12/2018
2 01/11/2019
Table 2
ID yield acquired_date
1 120 05/11/2019
1 100 05/11/2018
1 200 07/11/2018
1 200 08/11/2018
2 350 04/10/2020
2 300 04/10/2019
2 100 05/10/2019
output
ID start_date yield_mean
1 01/12/2018 250
2 01/11/2019 200
Note: the mean only accounts for values where acquired date is 30 days less than start date so row 0 and row 4 are not used.
dataframe
apache-spark
pyspark
mean
0 Answers
Your Answer