modeva.TestSuite.diagnose_robustness#
- TestSuite.diagnose_robustness(dataset: str = 'test', threshold: float = 0.1, metric: str = None, n_repeats: int = 10, perturb_features: str | Tuple = None, perturb_method: str = 'normal', noise_levels: float | int | Tuple = 0.1, random_state: int = 0)#
Evaluate model robustness by measuring performance under feature perturbations.
This test assesses how model predictions change when input features are perturbed with different noise levels. It helps identify the model’s stability and sensitivity to input variations.
- Parameters:
dataset ({"main", "train", "test"}, default="test") – Dataset to evaluate robustness on.
metric (str, metric=None) –
Model performance metric to use.
For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”
For regression (default=”MSE”): “MSE”, “MAE”, and “R2”
n_repeats (int, default=10) – Number of times to repeat the perturbation test for each noise level.
perturb_features (str or tuple, default=None) – Features to perturb during testing. If None, all features are perturbed. Can be a single feature name or list of feature names.
perturb_method ({"normal", "quantile"}, default="normal") –
Method to perturb numerical features:
”normal”: Add Gaussian noise scaled by feature standard deviation
”quantile”: Perturb in quantile space with uniform noise
noise_levels (float or tuple, default=0.1) –
Magnitude of perturbation to apply. Can be a single value or tuple of values.
For “normal” method: Standard deviation multiplier
For “quantile” method: Maximum quantile shift
threshold (float, default=0.1) – Proportion of samples to consider as “Non-robust” cases based on prediction changes. Used for separating samples into “Non-robust” and “Remaining” groups.
random_state (int, default=0) – Random seed for reproducible results.
- Returns:
Object containing:
key: “diagnose_robustness”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the test
value: Nested dict containing the detailed information about each noise level, e.g., results.value[0.2], which includes items:
”score”: The performance metric after perturbing the data with noise level 0.2;
”data_info”: The sample indices of small and large prediction changes groups (determined by the threshold), which can be further used for data distribution test, e.g.,
data_results = ds.data_drift_test(**results.value[0.2]["data_info"]) data_results.plot("summary") data_results.plot(("density", "MedInc"))
table: pd.DataFrame of performance under different noise level and repeat
options: Dictionary of visualizations configuration for a box plot where x-axis is the noise level, and y-axis is performance metric. Run results.plot() to show this plot.
- Return type:
Examples