modeva.TestSuite.diagnose_slicing_robustness#

TestSuite.diagnose_slicing_robustness(features: str | Tuple = None, dataset: str = 'test', method: str = 'uniform', bins: int | Dict = 10, metric: str = None, n_estimators: int = 1000, threshold: float | int = None, n_repeats: int = 10, perturb_features: str | Tuple = None, perturb_method: str = 'normal', noise_levels: float | int = 0.1, random_state: int = 0)#

Get unreliable regions based on one or two slicing features.

This function evaluates the robustness of a model by analyzing its performance across different slices of the dataset defined by the specified features. It computes the metric scores for the slices and identifies regions where the model’s performance is unreliable, allowing for insights into the model’s behavior under various conditions.

Parameters:

features (Union[str, Tuple], default=None) –
Feature names used for slicing. Each tuple element should contain at most 2 features.
- If features=(“X1”, ) or “X1”, computes 1D slicing over X1.
- If features=(“X1”, “X2”), computes 2D slicing over the interaction of X1 and X2.
- If features=((“X1”, ), (“X2”, )), computes 1D slicing over X1 and X2 separately.
Note: Batch mode for 2D slicing is not supported. If None, all 1D features will be used.
dataset ({"main", "train", "test"}, default="test") – The data set to be tested.
metric (str, metric=None) –
Model performance metric to use.
- For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”
- For regression (default=”MSE”): “MSE”, “MAE”, and “R2”
method ({"uniform", "quantile", "auto-xgb1", "precompute"}, default="uniform") –
Method for binning numerical features:
- ”uniform”: Equal-width binning
- ”quantile”: Equal-frequency binning (may result in fewer bins due to ties)
- ”auto-xgb1”: Use bins of a XGBoost depth-1 model fitted between X and residuals.
- ”precompute”: Uses pre-specified bin edges
bins (int or dict, default=10) –
Controls binning granularity:
- If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.
- If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.
n_estimators (int, default=1000) – The number of estimators in xgboost, used when method=”auto-xgb1”.
threshold (float or int, default=None) – The metric threshold of non-robust regions. If not specified, it will be the robustness metric of the whole population.
n_repeats (int, default=10) – The number of perturbation repetition.
perturb_features (str or tuple, default=None) – Feature names used for perturbation. If None, all features will be perturbed.
perturb_method ({"normal", "quantile"}, default="normal") – The perturbation method of numerical features.
noise_levels (float or int, default=0.1) – The perturbation level.
random_state (int, default=0) – The random seed for reproducibility.

Returns:

A container object with the following components:

key: “diagnose_slicing_robustness”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the analysis
value: List of performance metrics for each segment, and each element is a dict containing
- ”Feature”: feature name
- ”Segment”: segment value (categorical) or segment range (numerical)
- ”Size”: number of samples in this segment
- <”metric”>: perturbed model performance metric value of this segment
- ”Sample_ID”: sample indices of this segment
- ”Sample_Dataset”: dataset name, e.g., “train”, “test”, etc.
- ”Segment_Info”: explicit definition of this segment, similar to “Segment”
- ”Weak”: boolean indicator showing whether this segment is weak or not
table: DataFrame summarizing slice-wise results
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; To display one preferred plot by results.plot(name=xxx), and the following names are available:
- None (If only one 1D or 2D slicing features are specified): Performance metric (after perturbation) plot against selected slicing feature(s).
- ”<feature_name>” (If multiple single features are specified): Performance metric (after perturbation) plot against selected slicing feature(s).

Return type:

ValidationResult

Examples

Robustness Analysis (Classification)

Robustness Analysis (Regression)