modeva.TestSuite.compare_resilience#

TestSuite.compare_resilience(dataset: str = 'test', method: str = 'worst-sample', metric: str = None, alphas: tuple = None, n_clusters: int = 10, random_state: int = 0)#

Compare model resilience performance under data shifts across multiple models.

This function compares the performance of different models under data shifts by evaluating their resilience scores. It allows users to specify the dataset partition, performance metric, and method for identifying problematic samples, and it returns a comprehensive result encapsulating the resilience scores and performance metrics.

Parameters:

dataset ({"main", "train", "test"}, default="test") – The dataset partition to analyze.
metric (str, default=None) –
Model performance metric to use.
- For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”
- For regression (default=”MSE”): “MSE”, “MAE”, and “R2”
method ({"worst-sample", "worst-cluster", "outer-sample", "hard-sample"}, default="worst-sample") –
Strategy for identifying challenging samples:
- ”worst-sample”: Ranks samples by their prediction error
- ”worst-cluster”: Groups samples into clusters and identifies problematic clusters
- ”outer-sample”: Uses PCA to detect statistical outliers
- ”hard-sample”: Trains a metamodel to identify inherently difficult samples
alphas (tuple of float, default=None) – Fraction of worst ratios within (0, 1]. If None, it defaults to (0.1, 0.2, 0.3, …, 0.9, 1.0).
n_clusters (int, default=10) – Number of clusters when using method=”worst-cluster”.
random_state (int, default=0) – Random seed for reproducibility.

Returns:

A container object with the following components:

key: “compare_resilience”
data: Name of the dataset used
model: List of model names being compared
inputs: Input parameters used
value: Dictionary of (“<model_name>”, item) pairs, which item is also a dictionary with scores for different fractions of worst samples (0.1 to 1.0) and corresponding sample indices.
- ”interval”: Prediction intervals / sets and related metrics
- ”data_info”: The sample indices of reliable and unreliable samples, which can be further used for data distribution test, e.g.,
  data_results = ds.data_drift_test(**results.value["MoLGBMRegressor"][0.2]["data_info"]) data_results.plot("summary") data_results.plot(("density", "MedInc"))
table: DataFrame with performance metrics across different data fractions
options: Dictionary of visualizations configuration for a mulit-line plot where x-axis is the worst fractions (0 to 1), and y-axis is performance metric. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Resilience Analysis (Classification)

Resilience Analysis (Regression)