modeva.DataSet.scale_numerical#

DataSet.scale_numerical(features: str | Tuple = None, dataset: str = 'main', method: str = 'minmax', minmax_range: tuple | None = (0, 1), n_quantiles: int = 1000)#

Scales numerical features using various scaling methods.

Applies the specified scaling transformation to selected numerical features in the dataset. The scaling parameters are computed based on the specified dataset and can be used to transform new data consistently.

Parameters:
  • features (str or tuple, default=None) – Features to be scaled. If None, all numerical features in the dataset will be scaled.

  • dataset ({"main", "train", "test"}, default="main") – Dataset used to compute scaling parameters (e.g., mean, std, min, max).

  • method ({"standardize", "minmax", "quantile", "log1p", "square"}, default="minmax") –

    Scaling method to apply:

    • ”standardize”: Centers data to zero mean and unit variance

    • ”minmax”: Scales data to a specific range

    • ”quantile”: Transforms using quantile information for robust scaling

    • ”log1p”: Applies natural logarithm plus one transformation

    • ”square”: Applies square transformation

  • minmax_range (tuple, default=(0, 1)) – Target range for minmax scaling, specified as (min, max).

  • n_quantiles (int, default=1000) – Number of quantiles for quantile transformation. Limited by sample size.

Returns:

A container object with the following components:

  • key: “data_preprocess_scaling”

  • data: Name of the dataset used

  • inputs: Input parameters used for scaling

  • value: Dictionary containing scaling configuration (<feature_name>, item) pairs, each item is a dictionary contains following:

    • ”fidx”: Feature index

    • ”scalers”: Function for scaling

    • ”feature_names_out”: List of output feature names

Return type:

ValidationResult

Examples

Data Processing and Feature Engineering

Data Processing and Feature Engineering