1 year ago
#356823
MeowMix
Resetting a group's cumsum when a condition is met
First time poster here, so apologies if I get something wrong.
I have a data table that looks like this currently:
dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))
dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)
I would like it to look like this, where the data are grouped by last_nm and then, if a row's intake date is at least 90 days after the group's initial one, it's flagged/its count increases by one on a rolling basis, within group.
dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
"mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))
dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)
I have gotten this far:
# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))
dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)),
.(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]
The issue is that the intake_round column currently reflects the difference from the date above rather than looking to the group's initial intake_date to see if it's at least 90 days later. ...I just can't figure out the script to get the cumsum(c(TRUE, ...) function to do this on a rolling basis, within group.
Have consulted a number of questions including:
incremental counter within dataframe only when a condition is met in r
Resetting the cumulative sum when a condition is met in R
Split into groups based on (multiple) conditions?
Cumulative sum that resets when the condition is no longer met
Please, any suggestions would be greatly, greatly appreciated!
r
group-by
conditional-statements
reset
cumsum
0 Answers
Your Answer