1

I want to plot the SHAP values for my RSF model; here is the code and error:

xvars <- c("RIDRETH1", "RXDLIPID", "DRXTKCAL", "DRXTPROT", "DRXTCARB", "DRXTCHOL", "DRXTFIBE", "DRXTVARA", "DRXTATOC", "DRXTSODI", "DRXTPOTA", "DRXTM161", "DRXTM181", "DRXTM201", "DRXTM221", "DRXTP182", "DRXTP183", "DRXTP184", "DRXTP204", "DRXTP205", "DRXTP225", "DRXTP226", "DRXTRET", "DRXT_G_TOTAL", "DRXT_V_STARCHY_TOTAL", "DRXTS160", "DRXTS180", "DRXTsumSFA", "INDFMPIR", "LBXCOT", "GENDERRC")

X <- Data[sample(nrow(Data), 1000), xvars]

bg_X <- Data[sample(nrow(Data), 200), ]


system.time(
  ks <- kernelshap(rf_mort_nutrients_withoutage_1018_all, X, bg_X = bg_X, type = 'prob')
)
ks

ks <- shapviz(ks)
sv_importance(ks, kind = "bee", )

Error: Fejl i align_pred(pred_fun(object, bg_X, ...)) : Predictions must be numeric! Timing stopped at: 0.03 0.05 0.11

These are my predictions:

rf_mort_nutrients_withoutage_1018_all$predicted
   [1]  81.31376  75.82491  99.35944  58.63055  67.65847  98.32906  75.33934 107.81604  62.22175  75.69875  69.99881  83.67161  81.39735  65.59381

I am not sure why it is not working. Anyone has an idea?

4
  • What do you get from predict(rf_mort_nutrients_withoutage_1018_all, X, type = 'prob')? You should get numeric output. Without reproducible example, we won't be able to help. Commented Nov 22, 2024 at 18:21
  • @MichaelM I get this: Sample size of test (predict) data: 1000 Number of grow trees: 200 Average no. of grow terminal nodes: 35.705 Total no. of grow variables: 31 Resampling used to grow trees: swor Resample size used to grow trees: 37243 Analysis: RSF Family: surv Commented Nov 25, 2024 at 9:16
  • Doesn't sound particularly numeric. Maybe you could extend the question and explain what are you trying to do. Somewhere you write "survival", but use kernelshap as in a classification situation. Commented Nov 25, 2024 at 20:11
  • @MichaelM I have a rfsrc model rf_mort_nutrients <- rfsrc(Surv(endage, mortality_status) ~ . , data = data, ntree = 200, nodesize = 1000, importance = T) and I would like to plot the SHAP values to understand which variables contribute most to mortality and in which direction, ie, how feature value alters the prediction. Is there a method compatible with random survival forests to have this? Commented Nov 27, 2024 at 9:36

1 Answer 1

1

To analyze continuous rank probability scores, you can work like this:

library(randomForestSRC)
library(survival)
library(kernelshap)
library(shapviz)

head(veteran)
#   trt celltype time status karno diagtime age prior
# 1   1 squamous   72      1    60        7  69     0
# 2   1 squamous  411      1    70        5  64    10
# 3   1 squamous  228      1    60        3  38     0

xvars <- setdiff(colnames(veteran), c("time", "status"))

fit <- rfsrc(
  reformulate(xvars, "Surv(time, status)"),
  data = veteran,
  ntree = 50,
  nodesize = 20,
  importance = TRUE
)

# Function that returns continuous rank probability scores
pred_fun <- function(model, data) {
  predict(model, data)$predicted
}

# Sample <=1000 rows from the training data. veteran is small enough to use all
X_explain <- veteran[xvars]
sv <- kernelshap(fit, X = X_explain, pred_fun = pred_fun) |> 
  shapviz()

sv |> sv_importance(kind = "bee")
sv |> sv_dependence(xvars)

enter image description here enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.