modeva.TestSuite.diagnose_slicing_reliability#

TestSuite.diagnose_slicing_reliability(features: str | Tuple = None, train_dataset: str = 'test', test_dataset: str = 'test', test_size: float = 0.5, method: str = 'uniform', bins: int | Dict = 10, n_estimators: int = 1000, threshold: float | int = None, metric: str = 'width', alpha: float = 0.1, max_depth: int = 5, random_state: int = 0)#

Get unreliable regions based on one or two slicing features.

This function analyzes the reliability of a model’s predictions by slicing the dataset based on the provided features. It computes metrics such as width or coverage for the specified bins and identifies regions where the model’s predictions may be unreliable.

Parameters:

features (Union[str, Tuple], default=None) –
Feature names used for slicing. Each tuple element should contain at most 2 features.
- If features=(“X1”, ) or “X1”, computes 1D slicing over X1.
- If features=(“X1”, “X2”), computes 2D slicing over the interaction of X1 and X2.
- If features=((“X1”, ), (“X2”, )), computes 1D slicing over X1 and X2 separately.
Note: Batch mode for 2D slicing is not supported. If None, all 1D features will be used.
train_dataset ({"main", "train", "test"}, default="test") – The data set used for training and calibration.
test_dataset ({"main", "train", "test"}, default="test") – The data set used for evaluation purpose.
test_size (float, default=0.5) – Optional test set percentage for splitting the data into train and test sets. Only used when train_dataset == test_dataset.
method ({"uniform", "quantile", "auto-xgb1", "precompute"}, default="uniform") –
Method for binning numerical features:
- ”uniform”: Equal-width binning
- ”quantile”: Equal-frequency binning (may result in fewer bins due to ties)
- ”auto-xgb1”: Use bins of a XGBoost depth-1 model fitted between X and residuals.
- ”precompute”: Uses pre-specified bin edges
bins (int or dict, default=10) –
Controls binning granularity:
- If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.
- If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.
n_estimators (int, default=1000) – The number of estimators in xgboost, used when method=”auto-xgb1”.
threshold (float or int, default=None) – The metric threshold of unreliable regions. If not specified, it will be the reliability metric of the whole population.
metric ({"width", "coverage"}, default="width") –
The metric to be calculated in each slicing bin.
- ”width”: The average width in each bin.
- ”coverage”: The average coverage in each bin.
alpha (float, default=0.1) – The expected coverage of prediction intervals / sets.
max_depth (int, default=5) – The max_depth parameter of GBM with quantile loss. Only used for regression tasks.
random_state (int, default=0) – The random seed for reproducibility.

Returns:

A container object with the following components:

key: “diagnose_slicing_reliability”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the analysis
value: List of performance metrics for each segment, and each element is a dict containing
- ”Feature”: feature name
- ”Segment”: segment value (categorical) or segment range (numerical)
- ”Size”: number of samples in this segment
- <”metric”>: reliability metric value of this segment
- ”Sample_ID”: sample indices of this segment
- ”Sample_Dataset”: dataset name, e.g., “train”, “test”, etc.
- ”Segment_Info”: explicit definition of this segment, similar to “Segment”
- ”Weak”: boolean indicator showing whether this segment is weak or not
table: DataFrame with slice analysis results
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; To display one preferred plot by results.plot(name=xxx), and the following names are available:
- None (If only one 1D or 2D slicing features are specified): Reliability metric plot against selected slicing feature(s).
- ”<feature_name>” (If multiple single features are specified): Reliability metric plot against selected slicing feature(s).

Return type:

ValidationResult

Examples

Reliability Analysis (Classification)

Reliability Analysis (Regression)