modeva.TestSuite.diagnose_reliability#

TestSuite.diagnose_reliability(train_dataset: str = 'test', test_dataset: str = 'test', test_size: float = 0.5, alpha: float = 0.1, max_depth: int = 5, width_threshold: float = 0.1, random_state: int = 0)#

Evaluates model reliability using split conformal prediction.

This method assesses the reliability of model predictions by generating prediction intervals (regression) or prediction sets (classification) using conformal prediction.

Parameters:

train_dataset ({"main", "train", "test"}, default="test") – Dataset used for calibrating the conformal prediction model. Not to be confused with the model’s original training set.
test_dataset ({"main", "train", "test"}, default="test") – Dataset used for evaluation.
test_size (float, default=0.5) – Proportion of data to use for testing when train_dataset == test_dataset. Must be between 0 and 1.
alpha (float, default=0.1) – Target miscoverage rate (1 - confidence level). For example, alpha=0.1 aims for 90% coverage.
max_depth (int, default=5) – Maximum depth of the gradient boosting trees for regression tasks. Only used when task_type is REGRESSION.
width_threshold (float, default=0.1) – For regression: proportion of samples with largest prediction intervals to classify as unreliable.
random_state (int, default=0) – Random seed for reproducibility.

Returns:

A result object containing:

key: “diagnose_reliability”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters used for the test
table: DataFrame with average width and coverage metrics
value: Dictionary containing detailed results including:
- ”interval”: Prediction intervals / sets and related metrics
- ”data_info”: The sample indices of reliable and unreliable samples, which can be further used for data distribution test, e.g.,
  data_results = ds.data_drift_test(**results.value["data_info"]) data_results.plot("summary") data_results.plot(("density", "MedInc"))
options: Dictionary of visualizations configuration for a line plot where x-axis is the actual response, and y-axis is prediction, and prediction interval. Run results.plot() to show this plot.

Return type:

ValidationResult

Notes

For regression tasks:

Uses residual quantile regression to calculate prediction intervals
The calibration dataset is split 50/50 into training (for fitting quantile regression model) and validation (for calculating the threshold of nonconformity scores) sets
Samples with widest prediction intervals are marked as unreliable

For classification tasks:

Generates prediction sets: {0}, {1}, {0,1}, or {}
Uses nonconformity scores to determine set membership
Samples with prediction sets {} or {0,1} are marked as unreliable

Examples

Reliability Analysis (Classification)

Reliability Analysis (Regression)