modeva.TestSuite.diagnose_slicing_accuracy#

TestSuite.diagnose_slicing_accuracy(features: str | Tuple = None, dataset: str = 'test', metric: str = None, method: str = 'uniform', bins: int | Dict = 10, n_estimators: int = 1000, threshold: float | int = None)#

Identify low-accuracy regions based on specified slicing features.

This method analyzes the performance of a model on specified features and identifies regions where the model exhibits low accuracy. It supports both 1D and 2D slicing based on the input features.

Parameters:
  • features (Union[str, Tuple], default=None) –

    Feature names used for slicing. Each tuple element should contain at most 2 features.

    • If features=(“X1”, ) or “X1”, computes 1D slicing over X1.

    • If features=(“X1”, “X2”), computes 2D slicing over the interaction of X1 and X2.

    • If features=((“X1”, ), (“X2”, )), computes 1D slicing over X1 and X2 separately.

    Note: Batch mode for 2D slicing is not supported. If None, all 1D features will be used.

  • dataset (str, default="test") – The dataset to be tested. Options are “main”, “train”, or “test”.

  • metric (str, metric=None) –

    Model performance metric to use.

    • For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”

    • For regression (default=”MSE”): “MSE”, “MAE”, and “R2”

  • method ({"uniform", "quantile", "auto-xgb1", "precompute"}, default="uniform") –

    Method for binning numerical features:

    • ”uniform”: Equal-width binning

    • ”quantile”: Equal-frequency binning (may result in fewer bins due to ties)

    • ”auto-xgb1”: Use bins of a XGBoost depth-1 model fitted between X and residuals.

    • ”precompute”: Uses pre-specified bin edges

    Note that for uniform, quantile, and precompute, all variables including inactive ones can be used for spliting. But for auto-xgb1, only use active features (X) for fitting XGB.

  • bins (int or dict, default=10) –

    Controls binning granularity:

    • If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.

    • If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.

  • n_estimators (int, default=1000) – The number of estimators in XGBoost, applicable when method=”auto-xgb1”.

  • threshold (float or int, default=None) – The metric threshold for identifying weak regions. If not specified, it will be the metric of the whole population.

Returns:

The result of the Slicing Accuracy detection, including key metrics and tables.

  • key: “diagnose_slicing_accuracy”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters used for the test

  • value: List of performance metrics for each segment, and each element is a dict containing

    • ”Feature”: feature name

    • ”Segment”: segment value (categorical) or segment range (numerical)

    • ”Size”: number of samples in this segment

    • <”metric”>: performance metric value of this segment

    • ”Sample_ID”: sample indices of this segment

    • ”Sample_Dataset”: dataset name, e.g., “train”, “test”, etc.

    • ”Segment_Info”: explicit definition of this segment, similar to “Segment”

    • ”Weak”: boolean indicator showing whether this segment is weak or not

  • table: pd.DataFrame summarizing the results, including features, segments, sizes, and the specified metric.

  • options: Dictionary of visualizations configuration. Run results.plot() to show all plots; To display one preferred plot by results.plot(name=xxx), and the following names are available:

    • None (If only one 1D or 2D slicing features are specified): Performance metric plot against selected slicing feature(s).

    • ”<feature_name>” (If multiple single features are specified): Performance metric plot against selected slicing feature(s).

Return type:

ValidationResult

Examples

Sliced Performance (Classification)

Sliced Performance (Classification)

Sliced Performance (Regression)

Sliced Performance (Regression)