modeva.TestSuite.compare_residual_cluster#
- TestSuite.compare_residual_cluster(dataset: str = 'test', response_type: str = 'abs_residual', metric: str = None, n_clusters: int = 10, cluster_method: str = 'pam', sample_size: int = 2000, rf_n_estimators: int = 100, rf_max_depth: int = 5, random_state: int = 0, n_repeats: int = 10, perturb_features: str | Tuple = None, perturb_method: str = 'normal', noise_level: float | int = 0.1, alpha: float = 0.1)#
Compare model residuals by clustering data points and evaluating performance within clusters.
This method evaluates how consistently a model performs across different subsets of data by: 1. Using Random Forest to generate a proximity matrix based on prediction residuals 2. Clustering the distance matrix using KMedoids 3. Analyzing model performance within each cluster
- Parameters:
dataset ({"main", "train", "test"}, default="test") – Dataset to analyze.
response_type (str, default="abs_residual") –
The response type, options include
”abs_residual”: absolute residual
”sq_residual”: squared residual
”abs_residual_perturb”: absolute residual after X perturbation as used in robustness test
”sq_residual_perturb”: squared residual after X perturbation
”pi_width”: prediction interval width obtained as used in reliability test; note that as dataset=”test”, the test data will be split for calibration (conformal prediction), so that the calibration set is excluded in the final reported results.
metric (str, metric=None) –
Model performance metric to use.
For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”
For regression (default=”MSE”): “MSE”, “MAE”, and “R2”
n_clusters (int, default=10) – Number of clusters to create.
cluster_method ({'alternate', 'pam'}, default='pam') – Which algorithm to use. ‘alternate’ is faster while ‘pam’ is more accurate.
sample_size (int, default=2000) – sample size for speedup the calculation of the proximity matrix and clustering.
rf_n_estimators (int, default=100) – Number of trees in the Random Forest used for proximity matrix.
rf_max_depth (int, default=5) – Maximum depth of trees in the Random Forest.
random_state (int, default=0) – Random seed for reproducibility.
response_kwargs (dict, default={}) – Addition arguments for calculating the response.
n_repeats (int, default=10) – Number of times to repeat the perturbation test for each noise level. Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.
perturb_features (str or tuple, default=None) – Features to perturb during testing. If None, all features are perturbed. Can be a single feature name or list of feature names. Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.
perturb_method ({"normal", "quantile"}, default="normal") –
Method to perturb numerical features:
”normal”: Add Gaussian noise scaled by feature standard deviation
”quantile”: Perturb in quantile space with uniform noise
Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.
noise_level (float, default=0.1) –
Magnitude of perturbation to apply. Can be a single value or tuple of values.
For “normal” method: Standard deviation multiplier
For “quantile” method: Maximum quantile shift
Only used as response_type=”abs_residual_perturb” or “sq_residual_perturb”.
alpha (float, default=0.1) – Target miscoverage rate (1 - confidence level). For example, alpha=0.1 aims for 90% coverage. Only used as response_type=”pi_width”.
- Returns:
Contains:
key: “compare_residual_cluster”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the test
table: DataFrame with performance metrics for each cluster
options: Dictionary of visualizations configuration. Run results.plot(name=xxx) to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:
”cluster_performance”: Bar plot visualizing the performance scores against each cluster.
”feature_importance”: feature importance plot.
- Return type:
Notes
When response_type = “pi_width” and dataset=”test”, the test data will be split for calibration (conformal prediction), so that the calibration set is excluded in the final reported results.
Examples