1 year ago

#326589

test-img

Anne

Shapley values for the three clusters by cluster number KMeans algorithm

I am trying to replicate this https://cast42.github.io/blog/datascience/python/clustering/altair/shap/2020/04/23/explain-clusters-to-business.html#Kmeans-clustering

But using R and not Python as in the article. What I haven't managed to get is the "Shapley values for the three clusters" part:

for cnr in df_km['cluster'].unique(): 
    shap.summary_plot(shap_values[cnr], X, max_display=30, show=False)
    plt.title(f'Cluster {cnr}') plt.show()

These are the results I've gotten so far. Note that I want to output the graph according to the label variable of the classification model. Thanks!

# Package names
packages <- c("splitstackshape", "shapr", "Matrix", "xgboost", "SHAPforxgboost")

# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])}


winequality <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep = ";")

#KMeans clasifier attribute evaluation
winequality_escale <- scale(winequality)
set.seed(123)
km.res_3 <- kmeans(winequality_escale, 3, nstart = 25)
km.res_3$size
km.res_3$centers
aggregate(winequality, by=list(cluster=km.res_3$cluster),  mean) 
k3 <- fviz_cluster(km.res_3, data=winequality_escale, palette= c("#2E9FDF", "#00AFBB", "#E7B800"), ellipse.type = "euclid", star.plot= T, repel = T, ggtheme = theme_minimal()) + ggtitle("k = 3")


winequality <- as.matrix(winequality) 
model <- xgboost(
  data = winequality,
  label = km.res_3$cluster,
  nround = 20,
  verbose = FALSE)

shap_values <- shap.values(xgb_model = model, X_train = winequality) 
shap_values$mean_shap_score
shap_values <- shap_values$shap_score

# shap.prep() returns the long-format SHAP data from either model or
shap_long <- shap.prep(xgb_model = model, X_train = winequality)
# is the same as: using given shap_contrib
shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = winequality)

# **SHAP summary plot**
shap.plot.summary(shap_long)

r

shap

shapley

0 Answers

Your Answer

Accepted video resources