modeva.TestSuite.explain_pdp#
- TestSuite.explain_pdp(features: str | Tuple[str] = None, dataset: str = 'test', sample_size: int = 5000, percentiles: Tuple = (0, 1), grid_resolution: int = 20, response_method: str = 'auto', random_state: int = 0)#
Calculate and visualize Partial Dependence Plot (PDP) for specified model features.
Partial Dependence Plots (PDP) show the marginal effect of one or two features on the predicted outcome of a machine learning model. They illustrate how the model’s predictions change as a feature varies over its range, while averaging out the effects of all other features. This makes PDPs a valuable tool for understanding feature importance and their relationships with the target variable in a model-agnostic way.
- Parameters:
features (str or tuple of str) –
Name of single feature or tuple of two feature names to analyze their effects on model output.
If features=(“X1”, ) or “X1”, visualize the main effect for X1.
If features=(“X1”, “X2”), visualize the interaction for X1 and X2.
If features=((“X1”, ), (“X2”, )), visualize the main effect for X1 and X2 separately.
Note: Batch mode for 2D effect plot is not supported. If None, all 1D features will be used.
dataset ({"main", "train", "test"}, default="test") – The dataset used for calculating the PDP results.
sample_size (int, default=5000) – Number of random samples to use for speeding up calculation. If None, all data points will be used.
percentiles (tuple, default=(0, 1)) – The lower and upper percentile used to create the extreme values for the grid. Must be in [0, 1].
grid_resolution (int, default=20) – The number of equally spaced points on the grid for each target feature.
response_method ({"auto", "decision_function", "predict_proba"}, default="auto") –
Prediction method to use for binary classification tasks:
”auto”: Uses ‘predict_proba’ if available, otherwise ‘decision_function’
”predict_proba”: Probability of the positive class
”decision_function”: Model’s decision function output
random_state (int, default=0) – Random seed for controlling reproducibility in subsampling.
- Returns:
PDP result containing:
key: “explain_pdp”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the analysis
value: Dictionary containing
”Value”: X grid values, can be a single 1D-array (1D) or list or 2 1D-arrays (2D);
”Effect”: PD values corresponding to grid values, can be a single 1D-array (1D) or 2D-array (2D)
table: DataFrame of PDP results
options: Dictionary of visualizations configuration for a line (1D numerical) / bar (1D categorical) / heatmap (2D) effect plot. Run results.plot() to show all plots; To display one preferred plot by results.plot(name=xxx), and the following names are available:
None: Effect plots of all effects specified in features.
”<effect_name>”: Effect plot of the selected main effect or pairwise interaction.
- Return type:
Notes
For single features, generates a line or bar plot depending on feature type. For two features, generates a heatmap showing the interaction effects.
Examples