modeva.TestSuite.diagnose_robustness#

TestSuite.diagnose_robustness(dataset: str = 'test', threshold: float = 0.1, metric: str = None, n_repeats: int = 10, perturb_features: str | Tuple = None, perturb_method: str = 'normal', noise_levels: float | int | Tuple = 0.1, random_state: int = 0)#

Evaluate model robustness by measuring performance under feature perturbations.

This test assesses how model predictions change when input features are perturbed with different noise levels. It helps identify the model’s stability and sensitivity to input variations.

Parameters:
  • dataset ({"main", "train", "test"}, default="test") – Dataset to evaluate robustness on.

  • metric (str, metric=None) –

    Model performance metric to use.

    • For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”

    • For regression (default=”MSE”): “MSE”, “MAE”, and “R2”

  • n_repeats (int, default=10) – Number of times to repeat the perturbation test for each noise level.

  • perturb_features (str or tuple, default=None) – Features to perturb during testing. If None, all features are perturbed. Can be a single feature name or list of feature names.

  • perturb_method ({"normal", "quantile"}, default="normal") –

    Method to perturb numerical features:

    • ”normal”: Add Gaussian noise scaled by feature standard deviation

    • ”quantile”: Perturb in quantile space with uniform noise

  • noise_levels (float or tuple, default=0.1) –

    Magnitude of perturbation to apply. Can be a single value or tuple of values.

    • For “normal” method: Standard deviation multiplier

    • For “quantile” method: Maximum quantile shift

  • threshold (float, default=0.1) – Proportion of samples to consider as “Non-robust” cases based on prediction changes. Used for separating samples into “Non-robust” and “Remaining” groups.

  • random_state (int, default=0) – Random seed for reproducible results.

Returns:

Object containing:

  • key: “diagnose_robustness”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters used for the test

  • value: Nested dict containing the detailed information about each noise level, e.g., results.value[0.2], which includes items:

    • ”score”: The performance metric after perturbing the data with noise level 0.2;

    • ”data_info”: The sample indices of small and large prediction changes groups (determined by the threshold), which can be further used for data distribution test, e.g.,

      data_results = ds.data_drift_test(**results.value[0.2]["data_info"])
      data_results.plot("summary")
      data_results.plot(("density", "MedInc"))
      
  • table: pd.DataFrame of performance under different noise level and repeat

  • options: Dictionary of visualizations configuration for a box plot where x-axis is the noise level, and y-axis is performance metric. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Robustness Analysis (Classification)

Robustness Analysis (Classification)

Robustness Analysis (Regression)

Robustness Analysis (Regression)