modeva.TestSuite.diagnose_resilience#

TestSuite.diagnose_resilience(dataset: str = 'test', method: str = 'worst-sample', metric: str = None, alphas: tuple = None, n_clusters: int = 10, random_state: int = 0)#

Evaluate model resilience by analyzing performance on challenging data subsets.

This method assesses how well the model maintains its performance when faced with increasingly difficult or anomalous samples. It helps identify potential vulnerabilities and understand the model’s behavior under stress conditions.

Parameters:
  • dataset ({"main", "train", "test"}, default="test") – The dataset to use for resilience testing.

  • method ({"worst-sample", "worst-cluster", "outer-sample", "hard-sample"}, default="worst-sample") –

    Strategy for identifying challenging samples:

    • ”worst-sample”: Ranks samples by their prediction error

    • ”worst-cluster”: Groups samples into clusters and identifies problematic clusters

    • ”outer-sample”: Uses PCA to detect statistical outliers

    • ”hard-sample”: Trains a metamodel to identify inherently difficult samples

  • metric (str, metric=None) –

    Model performance metric to use.

    • For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”

    • For regression (default=”MSE”): “MSE”, “MAE”, and “R2”

  • alphas (tuple of float, default=None) – Fraction of worst ratios within (0, 1]. If None, it defaults to (0.1, 0.2, 0.3, …, 0.9, 1.0).

  • n_clusters (int, default=10) – Number of clusters for the “worst-cluster” method. Ignored for other methods.

  • random_state (int, default=0) – Random seed for reproducibility.

Returns:

A result object containing:

  • key: “diagnose_resilience”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters used for the test

  • value: Nested dict containing the detailed information about each worst ratio (alpha), e.g., results.value[0.2], which includes items:

    • ”score”: The performance metric of the selected “worst samples”;

    • ”data_info”: The sample indices within and outside this cluster, which can be further used for data distribution test, e.g.,

      data_results = ds.data_drift_test(**results.value[0.2]["data_info"])
      data_results.plot("summary")
      data_results.plot(("density", "MedInc"))
      
  • table: DataFrame showing performance scores across different fractions

  • options: Dictionary of visualizations configuration for a line plot where x-axis is the worst fractions (0 to 1), and y-axis is performance metric. Run results.plot() to show this plot.

Return type:

ValidationResult

Notes

The method evaluates model performance on increasingly larger subsets of the most challenging samples (from 10% to 100%). A steep performance drop indicates potential resilience issues in specific scenarios.

Examples

Resilience Analysis (Classification)

Resilience Analysis (Classification)

Resilience Analysis (Regression)

Resilience Analysis (Regression)