1 year ago
#384969
xenotharm
Edit distance for a four-digit sequential ranking in R? (stringdist)
Right now, I am trying to create scale scores for participants who ranked four job candidates (A, B, C, and D) to a role from best fit to worst fit. The correct order is A, D, C, B. As far as my dataframe goes, the correct sequence for columns A, B, C, and D should therefore be 1, 4, 3, 2. Below is a sample of my dataframe with "Edit_score" representing what I think is degree of correctness, i.e. the degree to which the values in Concatted resemble "1432". I used stringdist
in the following code to produce this column:
data$edit_score <- stringdist("1432", data$Concatted, method = "jw")
I am not sure if the Jaro-Winkler method is the most appropriate for this type of variable. Should I be using a different stringdist
method? Is stringdist
the function I should be using to calculate this? I am trying to take into account both placement and sequence and really just need to assign scores to Concatted valued based on how closely they resemble the sequence "1432".
A | B | C | D | Concatted | Edit_score |
---|---|---|---|---|---|
4 | 2 | 3 | 1 | 4231 | 0.33333333 |
1 | 2 | 4 | 3 | 1243 | 0.16666667 |
r
edit-distance
stringdist
0 Answers
Your Answer