modeva.models.ModelTuneOptuna#
- class modeva.models.ModelTuneOptuna(dataset, model)#
Bases:
object
A class for performing hyperparameter tuning using the optuna Python package.
- run(param_distributions: Dict, dataset: str = 'train', n_iter: int = 10, sampler: str = 'tpe', sampler_args: dict = None, metric: str | Tuple = None, n_jobs: int = None, cv=None, error_score=nan, random_state: int = 0)#
Runs model tuning using Optuna for hyperparameter optimization.
This method performs hyperparameter optimization for the specified model using the Optuna library. It allows users to define parameter distributions, choose a sampling strategy, and specify evaluation metrics. The results of the optimization process are returned in a structured format, including the best parameters and their corresponding scores.
- Parameters:
param_distributions (dict) – A dictionary where keys are parameter names (str) and values are distributions or lists of parameters to try. The distributions must provide a method for sampling (e.g., from scipy.stats), and if a list is provided, it will be sampled uniformly.
dataset ({"main", "train", "test"}, default="train") – Specifies which dataset to use for model fitting. Options include “main” for the entire dataset, “train” for the training subset, and “test” for the testing subset.
n_iter (int, default=10) – The number of iterations for the optimization process, controlling the trade-off between runtime and the quality of the solution.
sampler ({"grid", "random", "tpe", "gp", "cma-es", "qmc"}, default="tpe") –
The sampling strategy used in optuna.
”grid” : Grid Search implemented in GridSampler
”random” : Random Search implemented in RandomSampler
”tpe” : Tree-structured Parzen Estimator algorithm implemented in TPESampler
”gp” : Gaussian process-based algorithm implemented in GPSampler
”cma-es”: CMA-ES based algorithm implemented in CmaEsSampler
”qmc” : A Quasi Monte Carlo sampling algorithm implemented in QMCSampler
sampler_args (dict, default=None) – The arguments passed to the sampler.
dataset – The data set for model fitting.
n_iter – Number of iterations of PSO. n_iter trades off runtime vs quality of the solution.
metric (str or tuple, default=None) – The performance metric(s). If None, we calculate the MSE, MAE, and R2 for regression; ACC, AUC, F1, LogLoss, and Brier for classification. Note that only the first one is used as the optimization objective.
cv (int, cross-validation generator or an iterable, default=None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross validation,
integer, to specify the number of folds in a (Stratified)KFold,
CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
n_jobs (int, default=None) – The number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context; -1 means using all processors.
error_score ('raise' or numeric, default=np.nan) – Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.
random_state (int, default=0) – The seed used for random number generation to ensure reproducibility of results.
- Returns:
A container object with the following components:
key: “model_tune_optuna”
data: Name of the dataset used
model: Name of the model used
inputs: Input parameters
value: Dictionary containing the optimization history
table: Tabular format of the optimization history
options: Dictionary of visualizations configuration. Run results.plot() to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:
”parallel”: Parallel plot of the hyperparameter settings and final performance.
”(<parameter>, <metric>)”: Bar plot showing the performance metric against parameter values.
- Return type:
Examples