modeva.TestSuite.diagnose_slicing_fairness#

TestSuite.diagnose_slicing_fairness(group_config, features: str | Tuple = None, favorable_label: int = 1, dataset: str = 'test', metric: str = None, method: str = 'uniform', bins: int | Dict = 10, n_estimators: int = 1000, threshold: float | int = None)#

Evaluate a model’s slicing fairness metric across different protected-reference groups.

This function assesses the fairness of a model by calculating specified metrics across various protected and reference groups defined in the group_config. It takes into account the features used for slicing, the dataset to be evaluated, and the method for binning numerical features, among other parameters. The results include a validation object containing the fairness metrics and related information.

Parameters:
  • group_config (dict) –

    Configuration defining protected and reference groups. Each key is a custom group name, and each value is a dictionary with group definitions. Supports three formats:

    1. For numerical features:

      {
          "feature": str,           # Feature name
          "protected": {            # Protected group bounds
              "lower": float,       # Lower bound
              "lower_inclusive": bool,
              "upper": float,       # Optional upper bound
              "upper_inclusive": bool
          },
          "reference": {            # Reference group bounds
              "lower": float,       # Optional lower bound
              "lower_inclusive": bool,
              "upper": float,       # Upper bound
              "upper_inclusive": bool
          }
      }
      
    2. For categorical features:

      {
          "feature": str,                  # Feature name
          "protected": str or int,         # Protected group category
          "reference": str or int          # Reference group category
      }
      
    3. For probabilistic group membership:

      {
          "by_weights": True,
          "protected": str,         # Column name with protected group probabilities
          "reference": str          # Column name with reference group probabilities
      }
      

  • features (Union[str, Tuple], default=None) –

    Feature names used for slicing. Each tuple element should contain at most 2 features.

    • If features=(“X1”, ) or “X1”, computes 1D slicing over X1.

    • If features=(“X1”, “X2”), computes 2D slicing over the interaction of X1 and X2.

    • If features=((“X1”, ), (“X2”, )), computes 1D slicing over X1 and X2 separately.

    Note: Batch mode for 2D slicing is not supported. If None, all 1D features will be used.

  • favorable_label ({0, 1}, default=1) –

    • For classification: The preferred class label.

    • For regression: 1 means larger predictions are preferred, 0 means smaller predictions are preferred.

  • dataset ({"main", "train", "test"}, default="train") – The dataset to be tested.

  • metric ({"AIR", "SMD", "PR", "RR"}, default=None) – The fairness metric(s) to calculate. If None, defaults to SMD for regression and AIR for classification.

  • method ({"uniform", "quantile", "auto-xgb1", "precompute"}, default="uniform") –

    Method for binning numerical features:

    • ”uniform”: Equal-width binning

    • ”quantile”: Equal-frequency binning (may result in fewer bins due to ties)

    • ”auto-xgb1”: Use bins of a XGBoost depth-1 model fitted between X and residuals.

    • ”precompute”: Uses pre-specified bin edges

  • bins (int or dict, default=10) –

    Controls binning granularity:

    • If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.

    • If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.

  • n_estimators (int, default=1000) – Number of estimators in xgboost, used when method=”auto-xgb1”.

  • threshold (float or int, default=None) – Threshold for filtering fairness metric results. If not specified, it will be the fairness metric of the whole population of each group, respectively.

Returns:

Slicing Fairness result, which includes:

  • key: “diagnose_slicing_fairness”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters used for the test

  • value: Nested dictionary containing the (“<feature_name>”, item) pairs for each feature (this level is only used in batch mode, i.e., multiple 1D features are specified), and the item is also a dictionary with:

    • ”<group_name>”: List of fairness metrics for each segment, and each element is a dict containing

      • ”Feature”: feature name

      • ”Segment”: segment value (categorical) or segment range (numerical)

      • ”Size”: number of samples in this segment

      • <”metric”>: fairness metric value of this segment

      • ”Sample_ID”: sample indices of this segment

      • ”Sample_Dataset”: dataset name, e.g., “train”, “test”, etc.

      • ”Segment_Info”: explicit definition of this segment, similar to “Segment”

      • ”Weak”: boolean indicator showing whether this segment is weak or not

  • table: dictionary of fairness metric table.

    • ”<group_name>”: Table of fairness metrics for each segment.

  • options: Dictionary of visualizations configuration. Run results.plot() to show all plots; To display one preferred plot by results.plot(name=xxx), and the following names are available:

    • ”<group_name>” (If only one 1D or 2D slicing features are specified): Fairness metric plots against selected slicing feature(s).

    • ”(<feature_name>, <group_name>)” (If multiple single features are specified): Fairness metric plots against selected slicing feature(s).

Return type:

ValidationResult

Examples

Model Fairness Analysis (Classification)

Model Fairness Analysis (Classification)