modeva.TestSuite.compare_fairness#

TestSuite.compare_fairness(group_config, favorable_label: int = 1, dataset: str = 'test', metric: str = None, threshold: float | int = None)#

Compares fairness metrics across multiple models.

This function evaluates and compares fairness metrics for various models based on the provided group configurations, allowing for a comprehensive analysis of model performance across different demographic groups.

Parameters:: group_config (dict) – Configuration defining protected and reference groups. Each key is a custom group name, and each value is a dictionary with group definitions. Supports three formats:

favorable_label{0, 1}, default=1: For classification: The preferred class label. For regression: 1 means larger predictions are preferred, 0 means smaller predictions are preferred.

dataset : {“main”, “train”, “test”}, 1. For numerical features:

{
    "feature": str,           # Feature name
    "protected": {            # Protected group bounds
        "lower": float,       # Lower bound
        "lower_inclusive": bool,
        "upper": float,       # Optional upper bound
        "upper_inclusive": bool
    },
    "reference": {            # Reference group bounds
        "lower": float,       # Optional lower bound
        "lower_inclusive": bool,
        "upper": float,       # Upper bound
        "upper_inclusive": bool
    }
}

For categorical features:

{
    "feature": str,                  # Feature name
    "protected": str or int,         # Protected group category
    "reference": str or int          # Reference group category
}

For probabilistic group membership:

{
    "by_weights": True,
    "protected": str,         # Column name with protected group probabilities
    "reference": str          # Column name with reference group probabilities
}default="test"

The dataset to evaluate fairness on.

metricstr, default=None

Fairness metric to calculate. Higher values indicate less unfairness. If None, defaults are used based on task type.

For regression (default=”SMD”):

SMD: Standardized Mean Difference (%) between protected and reference groups

For classification (default=”AIR”):

AIR: Adverse Impact Ratio of predicted probabilities

PR: Precision Ratio

RR: Recall Ratio

thresholdfloat or int, default=None

Optional threshold value to display in the visualization. Used to indicate acceptable fairness levels.

Returns:

A container object with the following components:

key: “compare_fairness”
data: Name of the dataset used
model: List of model names compared
inputs: Input parameters used
value: Dictionary of (“<model_name>”, item) pairs, which item is a nested dictionary with dictionary containing the (“<group_name>”, sub_item) pairs for each group; each sub_item contains
- ”distance”: The KS distance between protected vs reference group predictions.
- ”data_info”: A dictionary containing detailed information about the protected and reference groups, including sample indices and names.
  data_results = ds.data_drift_test(**results.value["MoLGBMClassifier"]["Gender"]["data_info"]) data_results.plot("summary") data_results.plot(("density", "PAY_1"))
table: DataFrame with detailed fairness metrics
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:
- ”fairness”: a bar plot where x-axis is the group names, and y-axis is fairness metric
- ”distance”: a bar plot where x-axis is the group names, and y-axis is KS distance metric

Return type:

ValidationResult

Examples

Model Fairness Analysis (Classification)