modeva.models.MoGAMINetRegressor#

class modeva.models.MoGAMINetRegressor(name: str = None, feature_names=None, feature_types=None, interact_num=10, subnet_size_main_effect=(20,), subnet_size_interaction=(20, 20), activation_func='ReLU', max_epochs=(1000, 1000, 1000), learning_rates=(0.001, 0.001, 0.0001), early_stop_thres=('auto', 'auto', 'auto'), batch_size=1000, batch_size_inference=10000, max_iter_per_epoch=100, val_ratio=0.2, warm_start=True, gam_sample_size=5000, mlp_sample_size=1000, heredity=True, reg_clarity=0.1, loss_threshold=0.01, reg_mono=0.1, mono_increasing_list=(), mono_decreasing_list=(), mono_sample_size=1000, include_interaction_list=(), boundary_clip=True, normalize=True, verbose=False, n_jobs=10, device=None, random_state=0)#

Generalized additive model with pairwise interaction regressor.

Parameters:
  • name (str, default=None) – The name of the model.

  • feature_names (list or None, default=None) – The list of feature names. If None, will use “X0”, “X1”, “X2”, etc.

  • feature_types (list or None, default=None) – The list of feature types. Available types include “numerical” and “categorical”. If None, will use numerical for all features.

  • interact_num (int, default=10) – The max number of interactions to be included in the second stage training.

  • subnet_size_main_effect (tuple of int, default=(20, )) – The hidden layer architecture of each subnetwork in the main effect block.

  • subnet_size_interaction (tuple of int, default=(20, 20)) – The hidden layer architecture of each subnetwork in the interaction block.

  • activation_func ({"ReLU", "Sigmoid", "Tanh"}, default="ReLU") – The name of the activation function.

  • max_epochs (tuple of THREE int, default=(1000, 1000, 1000)) – The max number of epochs in the first (main effect training), second (interaction training), and third (fine-tuning) stages, respectively.

  • learning_rates (tuple of THREE float, default=(1e-3, 1e-3, 1e-4)) – The initial learning rates of Adam optimizer in the first (main effect training), second (interaction training), and third (fine-tuning) stages, respectively.

  • early_stop_thres (tuple of THREE int or "auto", default=["auto", "auto", "auto"]) – The early stopping threshold in the first (main effect training), second (interaction training), and third (fine-tuning) stages, respectively. In auto mode, the value is set to max(5, min(5000 * n_features / (max_iter_per_epoch * batch_size), 100)).

  • batch_size (int, default=1000) – The batch size. Note that it should not be larger than the training size * (1 - validation ratio).

  • batch_size_inference (int, default=10000) – The batch size used in the inference stage. It is imposed to avoid out-of-memory issue when dealing very large dataset.

  • max_iter_per_epoch (int, default=100) – The max number of iterations per epoch. In the init stage of model fit, its value will be clipped by min(max_iter_per_epoch, int(sample_size / batch_size)). For each epoch, the data would be reshuffled and only the first “max_iter_per_epoch” batches would be used for training. It is imposed to make the training scalable for very large dataset.

  • val_ratio (float, default=0.2) – The validation ratio, should be greater than 0 and smaller than 1.

  • warm_start (bool, default=True) – Initialize the network by fitting a rough LGBM model. The initialization is performed by, 1) fit a LGBM model as teacher model, 2) generate random samples from the teacher model, 3) fit each subnetwork using the generated samples. And it is used for both main effect and interaction subnetwork initialization.

  • gam_sample_size (int, default=5000) – The sub-sample size for GAM fitting as warm_start=True.

  • mlp_sample_size (int, default=1000) – The generated sample size for individual subnetwork fitting as warm_start=True.

  • heredity (bool, default=True) – Whether to perform interaction screening subject to heredity constraint.

  • loss_threshold (float, default=0.01) – The loss tolerance threshold for selecting fewer main effects or interactions, according to the validation performance. For instance, assume the best validation performance is achieved when using 10 main effects; if only use the top 5 main effects also gives similar validation performance, we could prune the last 5 by setting this parameter to be positive.

  • reg_clarity (float, default=0.1) – The regularization strength of marginal clarity constraint.

  • reg_mono (float, default=0.1) – The regularization strength of monotonicity constraint.

  • mono_sample_size (int, default=1000) – As monotonicity constraint is used, we would generate some data points uniformly within the feature space per epoch, to impose the monotonicity regularization in addition to original training samples.

  • mono_increasing_list (tuple of str, default=()) – The feature name tuple subject to monotonic increasing constraint.

  • mono_decreasing_list (tuple of str, default=()) – The feature name tuple subject to monotonic decreasing constraint.

  • include_interaction_list (tuple of (str, str), default=()) – The tuple of interaction to be included for fitting, each interaction is expressed by (feature_name1, feature_name2).

  • boundary_clip (bool, default=True) – In the inference stage, whether to clip the feature values by their min and max values in the training data.

  • normalize (bool, default=True) – Whether to normalize the data before inputting to the network.

  • verbose (bool, default=False) – Whether to output the training logs.

  • n_jobs (int, default=10) – The number of cpu cores for parallel computing. -1 means all the available cpus will be used.

  • device (string, default=None) – The hardware device name used for training.

  • random_state (int, default=0) – The random seed.

net_#

The fitted GAMI-Net module.

Type:

torch network object

interaction_list_#

The list of feature index pairs (tuple) for each fitted interaction.

Type:

list of tuples

active_main_effect_index_#

The selected main effect index.

Type:

list of int

active_interaction_index_#

The selected interaction index.

Type:

list of int

main_effect_val_loss_#

The validation loss as the most important main effects are sequentially added.

Type:

list of float

interaction_val_loss_#

The validation loss as the most important interactions are sequentially added.

Type:

list of float

time_cost_#

The time cost of each stage.

Type:

list of tuple

n_interactions_#

The actual number of interactions used in the fitting stage. It is greater or equal to the number of active interactions.

Type:

int

calibrate_interval(X, y, alpha=0.1, max_depth: int = 5)#

Fit a conformal prediction model to the given data.

This method computes the model’s prediction interval calibrated to the given data.

If the model is a regressor, splits the data with 50% for fitting lower (alpha / 5) and upper (1 - alpha / 2) gradient boosting trees-based quantile regression to the model’s residual; and 50% for calibration.

If the model is a binary classifiers, it computes the calibration quantile based on predicted probabilities for the positive class.

Parameters:
  • X (X : np.ndarray of shape (n_samples, n_features)) – Feature matrix for prediction.

  • y (array-like of shape (n_samples, )) – Target values.

  • alpha (float, default=0.1) – Expected miscoverage for the conformal prediction.

  • max_depth (int, default=5) – Maximum depth of the gradient boosting trees for regression tasks. Only used when task_type is REGRESSION.

Raises:

ValueError – If the model is neither a regressor nor a classifier.:

fit(X, y, sample_weight=None)#

Fits a GAMINet regression model to the training data.

This method trains the model in three stages: main effects training, interaction training, and fine-tuning. It handles data preprocessing, model initialization, and the complete training pipeline.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – Training data features.

  • y (np.ndarray of shape (n_samples,)) – Target values for regression.

  • sample_weight (np.ndarray of shape (n_samples,), default=None) – Individual weights for each sample. If None, all samples are weighted equally.

Returns:

self – Returns the fitted estimator.

Return type:

object

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

load(file_name: str)#

Load the model into memory from file system.

Parameters:

file_name (str) – The path and name of the file.

Return type:

estimator object

predict(X)#

Model predictions, calling the child class’s ‘_predict’ method.

Parameters:

X (np.ndarray of shape (n_samples, n_features)) – Feature matrix for prediction.

Returns:

np.ndarray

Return type:

The (calibrated) final prediction

predict_interval(X)#

Predict the prediction interval for the given data based on the conformal prediction model.

It splits the data with 50% for fitting lower (alpha / 5) and upper (1 - alpha / 2) gradient boosting trees-based quantile regression to the model’s residual; and 50% for calibration.

Parameters:

X (np.ndarray of shape (n_samples, n_features)) – Feature matrix for prediction.

Returns:

np.ndarray – in the format [n_samples, 2] for regressors or a flattened array for classifiers.

Return type:

The lower and upper bounds of the prediction intervals for each sample

Raises:

ValueError – If fit_conformal has not been called to fit the conformal prediction model: before calling this method.

save(file_name: str)#

Save the model into file system.

Parameters:

file_name (str) – The path and name of the file.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance