modeva.TestSuite.diagnose_residual_cluster#

TestSuite.diagnose_residual_cluster(dataset: str = 'test', response_type: str = 'abs_residual', metric: str = None, n_clusters: int = 10, cluster_method: str = 'pam', sample_size: int = 2000, rf_n_estimators: int = 100, rf_max_depth: int = 5, random_state: int = 0, n_repeats: int = 10, perturb_features: str | Tuple = None, perturb_method: str = 'normal', noise_level: float | int = 0.1, alpha: float = 0.1)#

Analyze model residuals by clustering data points and evaluating performance within clusters.

This method evaluates how consistently a model performs across different subsets of data by: 1. Using Random Forest to generate a proximity matrix based on prediction residuals 2. Clustering the distance matrix using KMedoids 3. Analyzing model performance within each cluster

Parameters:
  • dataset ({"main", "train", "test"}, default="test") – Dataset to analyze.

  • response_type (str, default="abs_residual") –

    The response type, options include

    • ”abs_residual”: absolute residual

    • ”sq_residual”: squared residual

    • ”abs_residual_perturb”: absolute residual after X perturbation as used in robustness test

    • ”sq_residual_perturb”: squared residual after X perturbation

    • ”pi_width”: prediction interval width obtained as used in reliability test; note that as dataset=”test”, the test data will be split for calibration (conformal prediction), so that the calibration set is excluded in the final reported results.

  • metric (str, metric=None) –

    Model performance metric to use.

    • For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”

    • For regression (default=”MSE”): “MSE”, “MAE”, and “R2”

  • n_clusters (int, default=10) – Number of clusters to create.

  • cluster_method ({'alternate', 'pam'}, default='pam') – Which algorithm to use. ‘alternate’ is faster while ‘pam’ is more accurate.

  • sample_size (int, default=2000) – sample size for speedup the calculation of the proximity matrix and clustering.

  • rf_n_estimators (int, default=100) – Number of trees in the Random Forest used for proximity matrix.

  • rf_max_depth (int, default=5) – Maximum depth of trees in the Random Forest.

  • random_state (int, default=0) – Random seed for reproducibility.

  • response_kwargs (dict, default={}) – Addition arguments for calculating the response.

  • n_repeats (int, default=10) – Number of times to repeat the perturbation test for each noise level. Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • perturb_features (str or tuple, default=None) – Features to perturb during testing. If None, all features are perturbed. Can be a single feature name or list of feature names. Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • perturb_method ({"normal", "quantile"}, default="normal") –

    Method to perturb numerical features:

    • ”normal”: Add Gaussian noise scaled by feature standard deviation

    • ”quantile”: Perturb in quantile space with uniform noise

    Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • noise_level (float, default=0.1) –

    Magnitude of perturbation to apply. Can be a single value or tuple of values.

    • For “normal” method: Standard deviation multiplier

    • For “quantile” method: Maximum quantile shift

    Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.

  • alpha (float, default=0.1) – Target miscoverage rate (1 - confidence level). For example, alpha=0.1 aims for 90% coverage. Only used as response_type=”pi_width”.

Returns:

Contains:

  • key: “diagnose_resilience_cluster”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters used for the test

  • value: Dict with clustering results including feature importance, embeddings, and per-cluster performance

    • ”feature_importance”: residual feature importance by random forest

    • ”cluster_X”: X being clustered

    • ”cluster_y”: y being clustered

    • ”cluster_sample_weight”: sample_weight being clustered

    • ”cluster_labels”: cluster labels of each sample

    • ”cluster_pred_func”: the function that receives X as input and output the cluster ID

    • ”clusters”: Nested dict containing the detailed information about each cluster, the i-th cluster can be accessed via its cluster id, i.e., results.value[“cluster”][i], which includes items:

      • ”score”: The performance metric of this cluster;

      • ”data_info”: The sample indices within and outside this cluster, which can be further used for data distribution test, e.g.,

      data_results = ds.data_drift_test(**results.value["clusters"][2]["data_info"])
      data_results.plot("summary")
      data_results.plot(("density", "MedInc"))
      
  • table: DataFrame with performance metrics for each cluster

  • options: Dictionary of visualizations configuration. Run results.plot(name=xxx) to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:

    • ”cluster_residual”: Bar plot of residual for each cluster.

    • ”cluster_performance”: Bar plot visualizing the performance scores against each cluster.

    • ”feature_importance”: feature importance plot.

Return type:

ValidationResult

Notes

When response_type = “pi_width” and dataset=”test”, the test data will be split for calibration (conformal prediction), so that the calibration set is excluded in the final reported results.

Examples

Residual Analysis (Classification)

Residual Analysis (Classification)

Residual Analysis (Regression)

Residual Analysis (Regression)