modeva.models.ModelTuneOptuna#

class modeva.models.ModelTuneOptuna(dataset, model)#

Bases: object

A class for performing hyperparameter tuning using the optuna Python package.

run(param_distributions: Dict, dataset: str = 'train', n_iter: int = 10, sampler: str = 'tpe', sampler_args: dict = None, metric: str | Tuple = None, n_jobs: int = None, cv=None, error_score=nan, random_state: int = 0)#

Runs model tuning using Optuna for hyperparameter optimization.

This method performs hyperparameter optimization for the specified model using the Optuna library. It allows users to define parameter distributions, choose a sampling strategy, and specify evaluation metrics. The results of the optimization process are returned in a structured format, including the best parameters and their corresponding scores.

Parameters:

param_distributions (dict) – A dictionary where keys are parameter names (str) and values are distributions or lists of parameters to try. The distributions must provide a method for sampling (e.g., from scipy.stats), and if a list is provided, it will be sampled uniformly.
dataset ({"main", "train", "test"}, default="train") – Specifies which dataset to use for model fitting. Options include “main” for the entire dataset, “train” for the training subset, and “test” for the testing subset.
n_iter (int, default=10) – The number of iterations for the optimization process, controlling the trade-off between runtime and the quality of the solution.
sampler ({"grid", "random", "tpe", "gp", "cma-es", "qmc"}, default="tpe") –
The sampling strategy used in optuna.
- ”grid” : Grid Search implemented in GridSampler
- ”random” : Random Search implemented in RandomSampler
- ”tpe” : Tree-structured Parzen Estimator algorithm implemented in TPESampler
- ”gp” : Gaussian process-based algorithm implemented in GPSampler
- ”cma-es”: CMA-ES based algorithm implemented in CmaEsSampler
- ”qmc” : A Quasi Monte Carlo sampling algorithm implemented in QMCSampler
sampler_args (dict, default=None) – The arguments passed to the sampler.
dataset – The data set for model fitting.
n_iter – Number of iterations of PSO. n_iter trades off runtime vs quality of the solution.
metric (str or tuple, default=None) – The performance metric(s). If None, we calculate the MSE, MAE, and R2 for regression; ACC, AUC, F1, LogLoss, and Brier for classification. Note that only the first one is used as the optimization objective.
cv (int, cross-validation generator or an iterable, default=None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 5-fold cross validation,
- integer, to specify the number of folds in a (Stratified)KFold,
- CV splitter,
- An iterable yielding (train, test) splits as arrays of indices.
n_jobs (int, default=None) – The number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context; -1 means using all processors.
error_score ('raise' or numeric, default=np.nan) – Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.
random_state (int, default=0) – The seed used for random number generation to ensure reproducibility of results.

Returns:

A container object with the following components:

key: “model_tune_optuna”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters
value: Dictionary containing the optimization history
table: Tabular format of the optimization history
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:
- ”parallel”: Parallel plot of the hyperparameter settings and final performance.
- ”(<parameter>, <metric>)”: Bar plot showing the performance metric against parameter values.

Return type:

ValidationResult

Examples

Tuning with optuna (Experimental)