modeva.TestSuite.diagnose_slicing_overfit#

TestSuite.diagnose_slicing_overfit(features: str | Tuple = None, train_dataset: str = 'train', test_dataset: str = 'test', metric: str = None, method: str = 'uniform', bins: int | Dict = 10, n_estimators: int = 1000, threshold: float | int = None)#

Identify overfit regions based on one or two slicing features.

This method analyzes the performance gap between training and testing datasets for specified features, helping to identify potential overfitting.

Parameters:

features (Union[str, Tuple], default=None) –
Feature names used for slicing. Each tuple element should contain at most 2 features.
- If features=(“X1”, ) or “X1”, computes 1D slicing over X1.
- If features=(“X1”, “X2”), computes 2D slicing over the interaction of X1 and X2.
- If features=((“X1”, ), (“X2”, )), computes 1D slicing over X1 and X2 separately.
Note: Batch mode for 2D slicing is not supported. If None, all 1D features will be used.
train_dataset (str, default="train") – The dataset used for training. Options include “main”, “train”, or “test”.
test_dataset (str, default="test") – The dataset used for testing. Options include “main”, “train”, or “test”.
metric (str, metric=None) –
Model performance metric to use.
- For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”
- For regression (default=”MSE”): “MSE”, “MAE”, and “R2”
method (str, default="uniform") –
The binning method for numerical features. Options include:
- ”uniform”: Equal-width bins
- ”quantile”: Equal-frequency bins
- ”auto-xgb1”: Binning method for XGBoost
- ”precompute”: Predefined bins
bins (int or dict, default=10) –
Controls binning granularity:
- If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.
- If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.
n_estimators (int, default=1000) – The number of estimators to use in XGBoost when method=”auto-xgb1”.
threshold (float or int, default=None) – The metric gap threshold for identifying weak regions. If not specified, it will be the performance metric gap of the whole population.

Returns:

An object containing the results of the slicing overfit detection, including:

key: “diagnose_slicing_overfit”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the test
value: List of performance metrics for each segment, and each element is a dict containing
- ”Feature”: feature name
- ”Segment”: segment value (categorical) or segment range (numerical)
- ”Size”: number of samples in this segment
- <”metric”>: performance metric gap value of this segment
- ”Sample_ID”: sample indices of this segment
- ”Sample_Dataset”: dataset name, e.g., “train”, “test”, etc.
- ”Segment_Info”: explicit definition of this segment, similar to “Segment”
- ”Weak”: boolean indicator showing whether this segment is weak or not
table: pd.DataFrame summarizing the performance metrics for both training and testing datasets, including the calculated gaps.
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; To display one preferred plot by results.plot(name=xxx), and the following names are available:
- None (If only one 1D or 2D slicing features are specified): Performance gap plot against selected slicing feature(s).
- ”<feature_name>” (If multiple single features are specified): Performance gap plot against selected slicing feature(s).

Return type:

ValidationResult

Examples

Overfitting Analysis (Classification)

Overfitting Analysis (Regression)