modeva.TestSuite.diagnose_residual_interpret#

TestSuite.diagnose_residual_interpret(dataset: str = 'test', n_estimators: int = 100, max_depth: int = 2, **xgb_kwargs)#

Analyzes feature importance by examining their relationship with prediction residuals.

This method calculates how much each feature contributes to explaining the model’s prediction errors (residuals). A higher importance score indicates the feature has a stronger relationship with prediction errors.

As method is one of {“uniform”, “quantile”, “precompute”}, this test performs binning to each predictor variable, and then transform the binning results using one-hot encoding. The encoded varialbes are then fitted to the residual using l2-regularized linear model. The importance of each predictor (under the framework of functional ANOVA) are aggregated using the linear coefficients.

As the method is “auto-xgb1”, then a xgboost depth-1 model is used to fit predictors and the residual. And the feature importance of the xgboost model (under the framework of functional ANOVA) is used as final feature importance.

Parameters:
  • dataset ({"main", "train", "test"}, default="test") – Which dataset to analyze.

  • n_estimators (int, default=100) – Number of trees in Xgboost.

  • max_depth (int, default=2) – The maximum tree depth in Xgboost.

  • **xgb_kwargs – Other hyperparameters for xgboost.

Returns:

Contains:

  • key: “diagnose_residual_interpret”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters used for the test

  • value: Dictionary containing:

    ”Model”: the fitted xgboost model object “DataSet”: the dataset that contains residual as response “Feature Importance”: A dict with list of feature names and residual feature importance “Effect Importance”: A dict with list of effect names and residual effect importance

  • options: Dictionary of visualizations configuration. Run results.plot(name=xxx) to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:

    • ”feature_importance”: feature importance plot.

    • ”effect_importance”: effect importance plot.

Return type:

ValidationResult

Notes

The feature importance is calculated as the normalized variance of predictions when using each feature alone. For methods other than “auto-xgb1”, features are first binned then one-hot encoded before fitting a Ridge regression model to predict absolute residuals.

Examples

Residual Analysis (Classification)

Residual Analysis (Classification)

Residual Analysis (Regression)

Residual Analysis (Regression)