modeva.TestSuite.explain_hstatistic#
- TestSuite.explain_hstatistic(features: Tuple | List = None, dataset: str = 'test', sample_size: int = 5000, percentiles: Tuple = (0, 1), grid_resolution: int = 10, response_method: str = 'auto', random_state: int = 0)#
Calculate H-statistics for all feature pairs to measure feature interactions.
The H-statistic measures the strength of interaction effects between pairs of features by comparing their joint effect to the sum of their individual effects. It quantifies how much of the combined effect of two features comes from their interaction.
An H-statistic value of 0 indicates no interaction, meaning the features act independently. Values closer to 1 suggest stronger interaction effects between the features.
The statistic can be difficult to compare across feature pairs because the denominator varies depending on the pair, and weak main effects can lead to misleadingly high values.
Values greater than 1 are possible but harder to interpret, occurring when the variance of the interaction effect exceeds that of the partial dependence plot.
- Parameters:
features (tuple, default=None) – List of feature names for calculating the H-statistics. If None, all features will be used.
dataset ({"main", "train", "test"}, default="test") – Dataset to use for calculating the explanation results.
sample_size (int, default=5000) – Number of random samples to use for speeding up calculation. If None, all data will be used.
percentiles (Tuple[float, float], default=(0, 1)) – Lower and upper percentiles used to create the extreme values for the grid. Must be in [0, 1].
grid_resolution (int, default=10) – Number of equally spaced points on the grid for each target feature.
response_method ({"auto", "decision_function", "predict_proba"}, default="auto") –
Prediction method to use for binary classification tasks:
”auto”: Uses ‘predict_proba’ if available, otherwise ‘decision_function’
”predict_proba”: Probability of the positive class
”decision_function”: Model’s decision function output
random_state (int, default=0) – Random seed for controlling randomness in subsampling.
- Returns:
Object containing:
key: “explain_hstatistic”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the analysis
value: Dictionary containing:
”<feature_name>”: Dictionary of H-statistics between this feature to the rest features.
table: DataFrame of H-statistics for all feature pairs
options: Dictionary of visualizations configuration for a horizontal bar plot where x-axis is H-statistics, and y-axis is the feature name. Run results.plot() to show this plot.
- Return type:
Examples