1 year ago
#326589
Anne
Shapley values for the three clusters by cluster number KMeans algorithm
I am trying to replicate this https://cast42.github.io/blog/datascience/python/clustering/altair/shap/2020/04/23/explain-clusters-to-business.html#Kmeans-clustering
But using R and not Python as in the article. What I haven't managed to get is the "Shapley values for the three clusters" part:
for cnr in df_km['cluster'].unique():
shap.summary_plot(shap_values[cnr], X, max_display=30, show=False)
plt.title(f'Cluster {cnr}') plt.show()
These are the results I've gotten so far. Note that I want to output the graph according to the label variable of the classification model. Thanks!
# Package names
packages <- c("splitstackshape", "shapr", "Matrix", "xgboost", "SHAPforxgboost")
# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])}
winequality <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep = ";")
#KMeans clasifier attribute evaluation
winequality_escale <- scale(winequality)
set.seed(123)
km.res_3 <- kmeans(winequality_escale, 3, nstart = 25)
km.res_3$size
km.res_3$centers
aggregate(winequality, by=list(cluster=km.res_3$cluster), mean)
k3 <- fviz_cluster(km.res_3, data=winequality_escale, palette= c("#2E9FDF", "#00AFBB", "#E7B800"), ellipse.type = "euclid", star.plot= T, repel = T, ggtheme = theme_minimal()) + ggtitle("k = 3")
winequality <- as.matrix(winequality)
model <- xgboost(
data = winequality,
label = km.res_3$cluster,
nround = 20,
verbose = FALSE)
shap_values <- shap.values(xgb_model = model, X_train = winequality)
shap_values$mean_shap_score
shap_values <- shap_values$shap_score
# shap.prep() returns the long-format SHAP data from either model or
shap_long <- shap.prep(xgb_model = model, X_train = winequality)
# is the same as: using given shap_contrib
shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = winequality)
# **SHAP summary plot**
shap.plot.summary(shap_long)
r
shap
shapley
0 Answers
Your Answer