modeva.TestSuite.diagnose_resilience#
- TestSuite.diagnose_resilience(dataset: str = 'test', method: str = 'worst-sample', metric: str = None, alphas: tuple = None, n_clusters: int = 10, random_state: int = 0)#
Evaluate model resilience by analyzing performance on challenging data subsets.
This method assesses how well the model maintains its performance when faced with increasingly difficult or anomalous samples. It helps identify potential vulnerabilities and understand the model’s behavior under stress conditions.
- Parameters:
dataset ({"main", "train", "test"}, default="test") – The dataset to use for resilience testing.
method ({"worst-sample", "worst-cluster", "outer-sample", "hard-sample"}, default="worst-sample") –
Strategy for identifying challenging samples:
”worst-sample”: Ranks samples by their prediction error
”worst-cluster”: Groups samples into clusters and identifies problematic clusters
”outer-sample”: Uses PCA to detect statistical outliers
”hard-sample”: Trains a metamodel to identify inherently difficult samples
metric (str, metric=None) –
Model performance metric to use.
For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”
For regression (default=”MSE”): “MSE”, “MAE”, and “R2”
alphas (tuple of float, default=None) – Fraction of worst ratios within (0, 1]. If None, it defaults to (0.1, 0.2, 0.3, …, 0.9, 1.0).
n_clusters (int, default=10) – Number of clusters for the “worst-cluster” method. Ignored for other methods.
random_state (int, default=0) – Random seed for reproducibility.
- Returns:
A result object containing:
key: “diagnose_resilience”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the test
value: Nested dict containing the detailed information about each worst ratio (alpha), e.g., results.value[0.2], which includes items:
”score”: The performance metric of the selected “worst samples”;
”data_info”: The sample indices within and outside this cluster, which can be further used for data distribution test, e.g.,
data_results = ds.data_drift_test(**results.value[0.2]["data_info"]) data_results.plot("summary") data_results.plot(("density", "MedInc"))
table: DataFrame showing performance scores across different fractions
options: Dictionary of visualizations configuration for a line plot where x-axis is the worst fractions (0 to 1), and y-axis is performance metric. Run results.plot() to show this plot.
- Return type:
Notes
The method evaluates model performance on increasingly larger subsets of the most challenging samples (from 10% to 100%). A steep performance drop indicates potential resilience issues in specific scenarios.
Examples