modeva.models.ModelTuneOptuna#

class modeva.models.ModelTuneOptuna(dataset, model)#

Bases: object

A class for performing hyperparameter tuning using the optuna Python package.

run(param_distributions: Dict, dataset: str = 'train', n_iter: int = 10, sampler: str = 'tpe', sampler_args: dict = None, metric: str | Tuple = None, n_jobs: int = None, cv=None, error_score=nan, random_state: int = 0)#

Runs model tuning using Optuna for hyperparameter optimization.

This method performs hyperparameter optimization for the specified model using the Optuna library. It allows users to define parameter distributions, choose a sampling strategy, and specify evaluation metrics. The results of the optimization process are returned in a structured format, including the best parameters and their corresponding scores.

Parameters:
  • param_distributions (dict) – A dictionary where keys are parameter names (str) and values are distributions or lists of parameters to try. The distributions must provide a method for sampling (e.g., from scipy.stats), and if a list is provided, it will be sampled uniformly.

  • dataset ({"main", "train", "test"}, default="train") – Specifies which dataset to use for model fitting. Options include “main” for the entire dataset, “train” for the training subset, and “test” for the testing subset.

  • n_iter (int, default=10) – The number of iterations for the optimization process, controlling the trade-off between runtime and the quality of the solution.

  • sampler ({"grid", "random", "tpe", "gp", "cma-es", "qmc"}, default="tpe") –

    The sampling strategy used in optuna.

    • ”grid” : Grid Search implemented in GridSampler

    • ”random” : Random Search implemented in RandomSampler

    • ”tpe” : Tree-structured Parzen Estimator algorithm implemented in TPESampler

    • ”gp” : Gaussian process-based algorithm implemented in GPSampler

    • ”cma-es”: CMA-ES based algorithm implemented in CmaEsSampler

    • ”qmc” : A Quasi Monte Carlo sampling algorithm implemented in QMCSampler

  • sampler_args (dict, default=None) – The arguments passed to the sampler.

  • dataset – The data set for model fitting.

  • n_iter – Number of iterations of PSO. n_iter trades off runtime vs quality of the solution.

  • metric (str or tuple, default=None) – The performance metric(s). If None, we calculate the MSE, MAE, and R2 for regression; ACC, AUC, F1, LogLoss, and Brier for classification. Note that only the first one is used as the optimization objective.

  • cv (int, cross-validation generator or an iterable, default=None) –

    Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 5-fold cross validation,

    • integer, to specify the number of folds in a (Stratified)KFold,

    • CV splitter,

    • An iterable yielding (train, test) splits as arrays of indices.

  • n_jobs (int, default=None) – The number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context; -1 means using all processors.

  • error_score ('raise' or numeric, default=np.nan) – Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.

  • random_state (int, default=0) – The seed used for random number generation to ensure reproducibility of results.

Returns:

A container object with the following components:

  • key: “model_tune_optuna”

  • data: Name of the dataset used

  • model: Name of the model used

  • inputs: Input parameters

  • value: Dictionary containing the optimization history

  • table: Tabular format of the optimization history

  • options: Dictionary of visualizations configuration. Run results.plot() to show all plots; Run results.plot(name=xxx) to display one preferred plot; and the following names are available:

    • ”parallel”: Parallel plot of the hyperparameter settings and final performance.

    • ”(<parameter>, <metric>)”: Bar plot showing the performance metric against parameter values.

Return type:

ValidationResult

Examples

Tuning with optuna (Experimental)

Tuning with optuna (Experimental)