modeva.TestSuite.diagnose_residual_cluster#

TestSuite.diagnose_residual_cluster(dataset: str = 'test', response_type: str = 'abs_residual', metric: str = None, n_clusters: int = 10, cluster_method: str = 'ltc', kmedoids_method: str = 'pam', sample_size: int = 2000, n_estimators: int = 100, max_depth: int = 5, random_state: int = 0, n_repeats: int = 10, perturb_features: str | Tuple = None, perturb_method: str = 'normal', noise_level: float | int = 0.1, alpha: float = 0.1)#

Analyze model residuals by clustering data points and evaluating performance within clusters.

This test identifies groups of samples with similar residual patterns by clustering data points based on their learning trajectories or proximity in feature space. It helps diagnose model performance heterogeneity across different data regions and identify problematic clusters where the model performs poorly.

Parameters:
  • dataset ({"main", "train", "test"}, default="test") – Dataset to analyze.

  • response_type (str, default="abs_residual") –

    The response type, options include

    • ”abs_residual”: absolute residual

    • ”sq_residual”: squared residual

    • ”abs_residual_perturb”: absolute residual after X perturbation as used in robustness test

    • ”sq_residual_perturb”: squared residual after X perturbation

    • ”pi_width”: prediction interval width obtained as used in reliability test; note that as dataset=”test”, the test data will be split for calibration (conformal prediction), so that the calibration set is excluded in the final reported results.

  • metric (str, metric=None) –

    Model performance metric to use.

    • For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, “Precision”, “Recall”, and “Brier”

    • For regression (default=”MSE”): “MSE”, “MAE”, and “R2”

  • n_clusters (int, default=10) – Number of clusters to create.

  • cluster_method ({'ltc', 'rf'}, default='ltc') –

    Which algorithm to use.

    • ’ltc’: This method (Learning Trajectory Cluster; LTC) fits a gradient boosting models between predictors and the response_type. It extracts prediction trajectories during training, applies optional weighting schemes, performs PCA dimensionality reduction, and clusters samples based on their learning patterns.

    • ’rf’: This method (Random Forest; RF) fits a Random Forest between predictors and the response_type. Then, it generates a proximity matrix based on fitted trees, and clusters the distance matrix using KMedoids.

  • kmedoids_method ({'alternate', 'pam'}, default='pam') – Which algorithm to use in KMedoids. ‘alternate’ is faster while ‘pam’ is more accurate. Only used when cluster_method=’rf’.

  • sample_size (int, default=2000) – sample size for speedup the calculation of the proximity matrix and clustering. Only used when cluster_method=’rf’.

  • n_estimators (int, default=100) – Number of trees in the Random Forest (cluster_method=’rf’) or gradient boosting models (cluster_method=’ltc’).

  • max_depth (int, default=5) – Maximum depth of trees in the Random Forest (cluster_method=’rf’) or gradient boosting models (cluster_method=’ltc’).

  • random_state (int, default=0) – Random seed for reproducibility.

  • response_kwargs (dict, default={}) – Addition arguments for calculating the response.

  • n_repeats (int, default=10) – Number of times to repeat the perturbation test for each noise level. Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • perturb_features (str or tuple, default=None) – Features to perturb during testing. If None, all features are perturbed. Can be a single feature name or list of feature names. Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • perturb_method ({"normal", "quantile"}, default="normal") –

    Method to perturb numerical features:

    • ”normal”: Add Gaussian noise scaled by feature standard deviation

    • ”quantile”: Perturb in quantile space with uniform noise

    Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • noise_level (float, default=0.1) –

    Magnitude of perturbation to apply. Can be a single value or tuple of values.

    • For “normal” method: Standard deviation multiplier

    • For “quantile” method: Maximum quantile shift

    Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • alpha (float, default=0.1) – Target miscoverage rate (1 - confidence level). For example, alpha=0.1 aims for 90% coverage. Only used as response_type=”pi_width”.

Returns:

Contains:

  • key: “diagnose_resilience_cluster”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters used for the test

  • value: Dict with clustering results including feature importance, embeddings, and per-cluster performance

    • ”feature_importance”: residual feature importance by random forest

    • ”cluster_X”: X being clustered

    • ”cluster_y”: y being clustered

    • ”cluster_sample_weight”: sample_weight being clustered

    • ”cluster_labels”: cluster labels of each sample

    • ”cluster_pred_func”: the function that receives X as input and output the cluster ID

    • ”clusters”: Nested dict containing the detailed information about each cluster, the i-th cluster can be accessed via its cluster id, i.e., results.value[“cluster”][i], which includes items:

      • ”score”: The performance metric of this cluster;

      • ”data_info”: The sample indices within and outside this cluster, which can be further used for data distribution test, e.g.,

      data_results = ds.data_drift_test(**results.value["clusters"][2]["data_info"])
      data_results.plot("summary")
      data_results.plot(("density", "MedInc"))
      
  • table: DataFrame with performance metrics for each cluster

  • options: Dictionary of visualizations configuration. Run results.plot(name=xxx) to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:

    • ”cluster_residual”: Bar plot of residual for each cluster.

    • ”cluster_performance”: Bar plot visualizing the performance scores against each cluster.

    • ”feature_importance”: feature importance plot.

Return type:

ValidationResult

Notes

When response_type = “pi_width” and dataset=”test”, the test data will be split for calibration (conformal prediction), so that the calibration set is excluded in the final reported results.

Examples

Residual Analysis (Classification)

Residual Analysis (Classification)

Residual Analysis (Regression)

Residual Analysis (Regression)