modeva.TestSuite.compare_fairness#

TestSuite.compare_fairness(group_config, favorable_label: int = 1, dataset: str = 'test', metric: str = None, threshold: float | int = None)#

Compares fairness metrics across multiple models.

This function evaluates and compares fairness metrics for various models based on the provided group configurations, allowing for a comprehensive analysis of model performance across different demographic groups.

Parameters:

group_config (dict) – Configuration defining protected and reference groups. Each key is a custom group name, and each value is a dictionary with group definitions. Supports three formats:

favorable_label{0, 1}, default=1

For classification: The preferred class label. For regression: 1 means larger predictions are preferred, 0 means smaller predictions are preferred.

dataset : {“main”, “train”, “test”}, 1. For numerical features:

{
    "feature": str,           # Feature name
    "protected": {            # Protected group bounds
        "lower": float,       # Lower bound
        "lower_inclusive": bool,
        "upper": float,       # Optional upper bound
        "upper_inclusive": bool
    },
    "reference": {            # Reference group bounds
        "lower": float,       # Optional lower bound
        "lower_inclusive": bool,
        "upper": float,       # Upper bound
        "upper_inclusive": bool
    }
}
  1. For categorical features:

    {
        "feature": str,                  # Feature name
        "protected": str or int,         # Protected group category
        "reference": str or int          # Reference group category
    }
    
  2. For probabilistic group membership:

    {
        "by_weights": True,
        "protected": str,         # Column name with protected group probabilities
        "reference": str          # Column name with reference group probabilities
    }default="test"
    

The dataset to evaluate fairness on.

metricstr, default=None

Fairness metric to calculate. Higher values indicate less unfairness. If None, defaults are used based on task type.

For regression (default=”SMD”):

  • SMD: Standardized Mean Difference (%) between protected and reference groups

For classification (default=”AIR”):

  • AIR: Adverse Impact Ratio of predicted probabilities

  • PR: Precision Ratio

  • RR: Recall Ratio

thresholdfloat or int, default=None

Optional threshold value to display in the visualization. Used to indicate acceptable fairness levels.

Returns:

A container object with the following components:

  • key: “compare_fairness”

  • data: Name of the dataset used

  • model: List of model names compared

  • inputs: Input parameters used

  • value: Dictionary of (“<model_name>”, item) pairs, which item is a nested dictionary with dictionary containing the (“<group_name>”, sub_item) pairs for each group; each sub_item contains

    • ”distance”: The KS distance between protected vs reference group predictions.

    • ”data_info”: A dictionary containing detailed information about the protected and reference groups, including sample indices and names.

      data_results = ds.data_drift_test(**results.value["MoLGBMClassifier"]["Gender"]["data_info"])
      data_results.plot("summary")
      data_results.plot(("density", "PAY_1"))
      
  • table: DataFrame with detailed fairness metrics

  • options: Dictionary of visualizations configuration. Run results.plot() to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:

    • ”fairness”: a bar plot where x-axis is the group names, and y-axis is fairness metric

    • ”distance”: a bar plot where x-axis is the group names, and y-axis is KS distance metric

Return type:

ValidationResult

Examples

Model Fairness Analysis (Classification)

Model Fairness Analysis (Classification)