modeva.TestSuite.compare_slicing_overfit#

TestSuite.compare_slicing_overfit(features: str, train_dataset: str = 'train', test_dataset: str = 'test', metric: str = None, method: str = 'uniform', bins: int | Dict = 10, n_estimators: int = 1000, threshold: float | int = None)#

Compares model performance across different data slices to identify potential overfit regions.

This function evaluates the performance of multiple models on specified data slices, allowing for the identification of regions where a model may be overfitting. It utilizes various binning methods to segment the data based on the specified feature and computes performance metrics to assess the differences between training and testing datasets.

Parameters:
  • features (str) – Name of the feature to use for data slicing.

  • train_dataset ({"main", "train", "test"}, default="train") – Specifies which dataset to use as the training set for comparison.

  • test_dataset ({"main", "train", "test"}, default="test") – Specifies which dataset to use as the test set for comparison.

  • metric (str, default=None) –

    Model performance metric to use.

    • For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”

    • For regression (default=”MSE”): “MSE”, “MAE”, and “R2”

  • method ({"uniform", "quantile", "auto-xgb1", "precompute"}, default="uniform") –

    Method for binning numerical features:

    • ”uniform”: Equal-width binning

    • ”quantile”: Equal-frequency binning

    • ”auto-xgb1”: XGBoost-based automatic binning

    • ”precompute”: Use pre-specified bin edges

  • bins (int or dict, default=10) –

    Controls binning granularity:

    • If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.

    • If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.

  • n_estimators (int, default=1000) – Number of trees for XGBoost when using method=”auto-xgb1”

  • threshold (float or int, default=None) – Threshold for filtering overfit regions. If not specified, it will not be used.

Returns:

A container object with the following components:

  • key: “compare_slicing_overfit”

  • data: Name of the dataset used

  • model: List of model names being compared

  • inputs: Input parameters used for the analysis

  • value: Dictionary of (“<model_name>”, item) pairs, where each item is a nested dictionary with dictionary containing the information about the performance metric gap for each segment,

    • ”Feature”: feature name

    • ”Segment”: segment value (categorical) or segment range (numerical)

    • ”Size”: number of samples in this segment

    • <”metric”>: performance metric gap value of this segment

    • ”Sample_ID”: sample indices of this segment

    • ”Sample_Dataset”: dataset name, e.g., “train”, “test”, etc.

    • ”Segment_Info”: explicit definition of this segment, similar to “Segment”

    • ”Weak”: boolean indicator showing whether this segment is weak or not

  • table: DataFrame with detailed performance metrics for each slice

  • options: Dictionary of visualizations configuration for a mulit-line plot where x-axis is the selected slicing feature, and y-axis is performance metric gap. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Overfitting Analysis (Classification)

Overfitting Analysis (Classification)

Overfitting Analysis (Regression)

Overfitting Analysis (Regression)