modeva.TestSuite.compare_fairness#
- TestSuite.compare_fairness(group_config, favorable_label: int = 1, dataset: str = 'test', metric: str = None, threshold: float | int = None)#
Compares fairness metrics across multiple models.
This function evaluates and compares fairness metrics for various models based on the provided group configurations, allowing for a comprehensive analysis of model performance across different demographic groups.
- Parameters:
group_config (dict) – Configuration defining protected and reference groups. Each key is a custom group name, and each value is a dictionary with group definitions. Supports three formats:
- favorable_label{0, 1}, default=1
For classification: The preferred class label. For regression: 1 means larger predictions are preferred, 0 means smaller predictions are preferred.
dataset : {“main”, “train”, “test”}, 1. For numerical features:
{ "feature": str, # Feature name "protected": { # Protected group bounds "lower": float, # Lower bound "lower_inclusive": bool, "upper": float, # Optional upper bound "upper_inclusive": bool }, "reference": { # Reference group bounds "lower": float, # Optional lower bound "lower_inclusive": bool, "upper": float, # Upper bound "upper_inclusive": bool } }
For categorical features:
{ "feature": str, # Feature name "protected": str or int, # Protected group category "reference": str or int # Reference group category }
For probabilistic group membership:
{ "by_weights": True, "protected": str, # Column name with protected group probabilities "reference": str # Column name with reference group probabilities }default="test"
The dataset to evaluate fairness on.
- metricstr, default=None
Fairness metric to calculate. Higher values indicate less unfairness. If None, defaults are used based on task type.
For regression (default=”SMD”):
SMD: Standardized Mean Difference (%) between protected and reference groups
For classification (default=”AIR”):
AIR: Adverse Impact Ratio of predicted probabilities
PR: Precision Ratio
RR: Recall Ratio
- thresholdfloat or int, default=None
Optional threshold value to display in the visualization. Used to indicate acceptable fairness levels.
- Returns:
A container object with the following components:
key: “compare_fairness”
data: Name of the dataset used
model: List of model names compared
inputs: Input parameters used
value: Dictionary of (“<model_name>”, item) pairs, which item is a nested dictionary with dictionary containing the (“<group_name>”, sub_item) pairs for each group; each sub_item contains
”distance”: The KS distance between protected vs reference group predictions.
”data_info”: A dictionary containing detailed information about the protected and reference groups, including sample indices and names.
data_results = ds.data_drift_test(**results.value["MoLGBMClassifier"]["Gender"]["data_info"]) data_results.plot("summary") data_results.plot(("density", "PAY_1"))
table: DataFrame with detailed fairness metrics
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:
”fairness”: a bar plot where x-axis is the group names, and y-axis is fairness metric
”distance”: a bar plot where x-axis is the group names, and y-axis is KS distance metric
- Return type:
Examples