modeva.TestSuite.compare_slicing_fairness#
- TestSuite.compare_slicing_fairness(group_config, features: str, favorable_label: int = 1, dataset: str = 'test', metric: str = None, method: str = 'uniform', bins: int | Dict = 10, n_estimators: int = 1000, threshold: float | int = None)#
Evaluates fairness metrics across different protected and reference groups by slicing the data.
This function computes fairness metrics for specified groups in the dataset by analyzing the model’s predictions against the actual outcomes, allowing for a detailed comparison of fairness across different slices of the data.
- Parameters:
group_config (dict) –
Configuration defining protected and reference groups. Each key is a custom group name, and each value is a dictionary with group definitions. Supports three formats:
- For numerical features:
- {
“feature”: str, # Feature name “protected”: { # Protected group bounds
”lower”: float, # Lower bound “lower_inclusive”: bool, “upper”: float, # Optional upper bound “upper_inclusive”: bool
}, “reference”: { # Reference group bounds
”lower”: float, # Optional lower bound “lower_inclusive”: bool, “upper”: float, # Upper bound “upper_inclusive”: bool
}
}
- For categorical features:
- {
“feature”: str, # Feature name “protected”: str or int, # Protected group category “reference”: str or int # Reference group category
}
- For probabilistic group membership:
- {
“by_weights”: True, “protected”: str, # Column name with protected group probabilities “reference”: str # Column name with reference group probabilities
}
features (str) – Name of the feature to use for slicing the data
favorable_label ({0, 1}, default=1) – For classification: The preferred class label. For regression: 1 means larger predictions are preferred, 0 means smaller predictions are preferred.
dataset ({"main", "train", "test"}, default="test") – Which dataset partition to analyze
metric (str, default=None) –
Model performance metric to use.
For classification (default=”AUC”): “ACC”, “AUC”, “F1”, “LogLoss”, and “Brier”
For regression (default=”MSE”): “MSE”, “MAE”, and “R2”
method ({"uniform", "quantile", "auto-xgb1", "precompute"}, default="uniform") – Method for binning numerical features
bins (int or dict, default=10) –
Controls binning granularity:
If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.
If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.
n_estimators (int, default=1000) – Number of estimators in xgboost, used when method=”auto-xgb1”.
threshold (float or int, default=None) – Threshold for filtering fairness metric results. If not specified, it will not be used.
- Returns:
A container object with the following components:
key: “compare_slicing_fairness”
data: Name of the dataset used
model: List of model names analyzed
inputs: Input parameters used
value: Dictionary of (“<model_name>”, item) pairs, and the item is also a dictionary with:
”<group_name>”: List of fairness metrics for each segment, and each element is a dict containing
”Feature”: feature name
”Segment”: segment value (categorical) or segment range (numerical)
”Size”: number of samples in this segment
<”metric”>: fairness metric value of this segment
”Sample_ID”: sample indices of this segment
”Sample_Dataset”: dataset name, e.g., “train”, “test”, etc.
”Segment_Info”: explicit definition of this segment, similar to “Segment”
”Weak”: boolean indicator showing whether this segment is weak or not
table: dictionary of fairness metric table.
”<group_name>”: Table of fairness metrics for each segment.
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:
”<group_name>”: Line plots visualizing the fairness metrics against the slicing feature.
- Return type:
Examples