modeva.TestSuite.diagnose_residual_interpret#
- TestSuite.diagnose_residual_interpret(dataset: str = 'test', n_estimators: int = 100, max_depth: int = 2, **xgb_kwargs)#
Analyzes feature importance by examining their relationship with prediction residuals.
This method calculates how much each feature contributes to explaining the model’s prediction errors (residuals). A higher importance score indicates the feature has a stronger relationship with prediction errors.
As method is one of {“uniform”, “quantile”, “precompute”}, this test performs binning to each predictor variable, and then transform the binning results using one-hot encoding. The encoded varialbes are then fitted to the residual using l2-regularized linear model. The importance of each predictor (under the framework of functional ANOVA) are aggregated using the linear coefficients.
As the method is “auto-xgb1”, then a xgboost depth-1 model is used to fit predictors and the residual. And the feature importance of the xgboost model (under the framework of functional ANOVA) is used as final feature importance.
- Parameters:
dataset ({"main", "train", "test"}, default="test") – Which dataset to analyze.
n_estimators (int, default=100) – Number of trees in Xgboost.
max_depth (int, default=2) – The maximum tree depth in Xgboost.
**xgb_kwargs – Other hyperparameters for xgboost.
- Returns:
Contains:
key: “diagnose_residual_interpret”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the test
value: Dictionary containing:
”Model”: the fitted xgboost model object “DataSet”: the dataset that contains residual as response “Feature Importance”: A dict with list of feature names and residual feature importance “Effect Importance”: A dict with list of effect names and residual effect importance
options: Dictionary of visualizations configuration. Run results.plot(name=xxx) to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:
”feature_importance”: feature importance plot.
”effect_importance”: effect importance plot.
- Return type:
Notes
The feature importance is calculated as the normalized variance of predictions when using each feature alone. For methods other than “auto-xgb1”, features are first binned then one-hot encoded before fitting a Ridge regression model to predict absolute residuals.
Examples