Model Probability Calibration#
Probability calibration adjusts a model’s predicted probabilities to align more closely with actual observed probabilities. This is particularly important in classification tasks where raw model outputs may not accurately reflect confidence levels.
Platt Scaling (method=’sigmoid’): Fits a logistic regression model to transform predicted probabilities.
Isotonic Regression (method=’isotonic’): A non-parametric calibration method that ensures monotonicity. Isotonic regression requires a sufficient number of samples to avoid overfitting.
The calibrate_proba method allows users to fit a calibration model on raw probability outputs.
1. Prepare data and model#
from modeva import DataSet
from modeva.models import MoXGBClassifier
ds = DataSet()
ds.load(name="TaiwanCredit")
ds.set_random_split()
model = MoXGBClassifier(name="Raw XGB", max_depth=2)
model.fit(ds.train_x, ds.train_y)
2. Calibration#
model.calibrate_proba(X_test, y_test, method='sigmoid')
The predict_proba method then applies the calibration:
3. Get calibrate predict proba#
calibrated_probs = model.predict_proba(X_test, calibration=True)
Probability calibration assumes that the validation data is representative of the test distribution. If the data distribution shifts significantly, recalibration may be necessary.