Linear Tree Classification#

Installation

# To install the required package, use the following command:
# !pip install modeva

Authentication

# To get authentication, use the following command: (To get full access please replace the token to your own token)
# from modeva.utils.authenticate import authenticate
# authenticate(auth_code='eaaa4301-b140-484c-8e93-f9f633c8bacb')

Import required modules

from modeva import DataSet
from modeva import TestSuite
from modeva.models import MoLGBMClassifier, MoGLMTreeBoostClassifier, MoNeuralTreeClassifier

Load and prepare dataset

ds = DataSet()
ds.load(name="TaiwanCredit")
ds.set_random_split()
ds.set_target("FlagDefault")

LGBM Linear Tree model#

model = MoLGBMClassifier(linear_trees=True, max_depth=2, verbose=-1, random_state=0)
model.fit(ds.train_x, ds.train_y.ravel())
MoLGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
                 importance_type='split', learning_rate=0.1, linear_trees=True,
                 max_depth=2, min_child_samples=20, min_child_weight=0.001,
                 min_split_gain=0.0, n_estimators=100, n_jobs=None,
                 num_leaves=31, objective=None, random_state=0, reg_alpha=0.0,
                 reg_lambda=0.0, subsample=1.0, subsample_for_bin=200000,
                 subsample_freq=0, verbose=-1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Basic accuracy analysis

ts = TestSuite(ds, model)
results = ts.diagnose_accuracy_table()
results.table
AUC ACC F1 LogLoss Brier
train 0.7925 0.8217 0.4775 0.4232 0.1326
test 0.7840 0.8298 0.4846 0.4198 0.1307
GAP -0.0086 0.0082 0.0071 -0.0034 -0.0019


Feature importance analysis

results = ts.interpret_fi()
results.plot()


Local feature importance analysis

results = ts.interpret_local_fi(sample_index=1, centered=True)
results.plot()


Boosted GLMTree model#

model = MoGLMTreeBoostClassifier(max_depth=1, n_estimators=100,
                                 reg_lambda=0.001, verbose=True, random_state=0)
model.fit(ds.train_x, ds.train_y.ravel())
Iteration 1 with validation loss 0.46286
Iteration 2 with validation loss 0.45391
Iteration 3 with validation loss 0.45545
Iteration 4 with validation loss 0.45813
Iteration 5 with validation loss 0.45458
Iteration 6 with validation loss 0.45238
Iteration 7 with validation loss 0.45290
Iteration 8 with validation loss 0.45338
Iteration 9 with validation loss 0.45465
Iteration 10 with validation loss 0.45425
Iteration 11 with validation loss 0.45525
Iteration 12 with validation loss 0.45535
Early stop as validation loss does not decrease for certain iterations.
MoGLMTreeBoostClassifier(name='MoGLMTreeBoostClassifier', reg_lambda=0.001,
                         verbose=True)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Basic accuracy analysis

ts = TestSuite(ds, model)
results = ts.diagnose_accuracy_table()
results.table
AUC ACC F1 LogLoss Brier
train 0.7779 0.8199 0.4858 0.4384 1.3578e-01
test 0.7695 0.8207 0.4694 0.4394 1.3582e-01
GAP -0.0085 0.0008 -0.0164 0.0010 4.0008e-05


Main effect plot for numerical feature

results = ts.interpret_effects(features="PAY_1")
results.plot()


Main effect plot for categorical feature

results = ts.interpret_effects(features="EDUCATION")
results.plot()


Neural Tree model with Monotonicity Constraints#

modelnn = MoNeuralTreeClassifier(estimator=model,
                                 nn_temperature=0.0001,
                                 nn_max_epochs=20,
                                 feature_names=ds.feature_names,
                                 mono_increasing_list=("PAY_1",),
                                 mono_sample_size=1000,
                                 reg_mono=10,
                                 verbose=True,
                                 random_state=0)
modelnn.fit(ds.train_x, ds.train_y.ravel())
#### #### MoNeuralTree Training Stage 1: Use Fitted MoGLMTreeBoost ####
#### MoNeuralTree Training Stage 2: Fine-tuning via Gradient Descent ####
Initial training and validation loss: 0.4342 and 0.4552
Epoch 0: Train loss 14.8085, Validation loss 5.4476, Monotonicity loss 0.0591
Epoch 1: Train loss 4.7582, Validation loss 4.3893, Monotonicity loss 0.0149
Epoch 2: Train loss 4.9457, Validation loss 3.3488, Monotonicity loss 0.0101
Epoch 3: Train loss 5.2688, Validation loss 2.0793, Monotonicity loss 0.0034
Epoch 4: Train loss 5.3769, Validation loss 5.6655, Monotonicity loss 0.0030
Epoch 5: Train loss 3.5262, Validation loss 9.5774, Monotonicity loss 0.0026
Epoch 6: Train loss 6.7964, Validation loss 7.3313, Monotonicity loss 0.0022
Epoch 7: Train loss 4.9357, Validation loss 4.0601, Monotonicity loss 0.0042
Epoch 8: Train loss 5.2890, Validation loss 5.5741, Monotonicity loss 0.0012
Epoch 9: Train loss 6.0833, Validation loss 6.7742, Monotonicity loss 0.0007
Epoch 10: Train loss 4.2674, Validation loss 1.9368, Monotonicity loss 0.0002
Epoch 11: Train loss 4.5836, Validation loss 3.8350, Monotonicity loss 0.0004
Epoch 12: Train loss 4.4162, Validation loss 3.8734, Monotonicity loss 0.0000
Epoch 13: Train loss 4.2800, Validation loss 6.1858, Monotonicity loss 0.0000
Epoch 14: Train loss 4.6165, Validation loss 6.0532, Monotonicity loss 0.0000
Epoch 15: Train loss 4.1156, Validation loss 5.0387, Monotonicity loss 0.0000
Epoch 16: Train loss 4.5790, Validation loss 2.4589, Monotonicity loss 0.0000
Epoch 17: Train loss 5.1868, Validation loss 3.9839, Monotonicity loss 0.0000
Epoch 18: Train loss 4.9574, Validation loss 4.0296, Monotonicity loss 0.0000
Epoch 19: Train loss 5.1222, Validation loss 10.0932, Monotonicity loss 0.0000
Training is terminated as max_epoch is reached.
MoNeuralTreeClassifier(clip_predict=False, device='cpu',
                       estimator=MoGLMTreeBoostClassifier(name='MoGLMTreeBoostClassifier',
                                                          reg_lambda=0.001,
                                                          verbose=True),
                       learning_rate=1.0, max_depth=1, min_impurity_decrease=0,
                       min_samples_leaf=50, n_epoch_no_change=5,
                       n_estimators=100, n_feature_search=5, n_screen_grid=1,
                       n_split_grid=20, name='MoGLMTreeBoostClassifier',
                       nn_max_epochs=20, reg_lambda=0.001, simplified=True,
                       split_custom=None, verbose=True)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Basic accuracy analysis

ts = TestSuite(ds, modelnn)
results = ts.diagnose_accuracy_table()
results.table
AUC ACC F1 LogLoss Brier
train 0.6524 0.7793 0.0250 5.3202 0.2169
test 0.6688 0.7872 0.0377 5.0318 0.2082
GAP 0.0164 0.0079 0.0126 -0.2884 -0.0087


Feature importance analysis

results = ts.interpret_fi()
results.plot()


Main effect plot

results = ts.interpret_effects(features="PAY_1")
results.plot()


Total running time of the script: (5 minutes 41.958 seconds)

Gallery generated by Sphinx-Gallery