I am attempting to make a very simple summary plot for a random forest classification model using SHAP. Just to see if I could get the syntax correct, I generated a toy example and fit a random forest classifier to the data.
shap version: 0.45.0
Python version: 3.10.12
import shap
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Generate synthetic data
X, y = make_classification(n_samples=500, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Train a RandomForest model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
From here, I attempted to use SHAP's tree explainer to create shap values based on this model.
# Create a SHAP TreeExplainer
explainer = shap.TreeExplainer(model)
# Calculate SHAP values for the test set
shap_values = explainer.shap_values(X_test)
According to the documentation this returns the following:
"For models with a single output this returns a matrix of SHAP values (# samples x # features). Each row sums to the difference between the model output for that sample and the expected value of the model output (which is stored in the expected_value attribute of the explainer when it is constant). For models with vector outputs this returns a list of such matrices, one for each output."
I had thought that this would be a single output model (since this is a binary classification problem), but the object being returned instead seems to be acting like a multiclass classification model. I attempted to check the shapes and got the following:
X_test shape: (125,20)
shap_values shape: (125, 20, 2)
Attempting to run the summary plot command using these values gives me a bizarre 2 by 2 image, which I've included below.
shap.summary_plot(shap_values, X_test, plot_type="bar", max_display=None)
I'm unsure what exactly is causing this except for maybe it's taking the individual class probabilities instead of the flat prediction.
