modeva.DataSet.encode_categorical#

DataSet.encode_categorical(features: str | Tuple = None, dataset: str = 'main', method: str = 'ordinal', target: str = None)#

Encodes categorical features using either ordinal, one-hot, or target encoding methods.

This function transforms categorical variables into numerical format for machine learning models. It supports ordinal encoding (converting categories to integers), one-hot encoding (converting categories to binary columns), and target encoding (replacing categories with the mean of the target variable for each category). The encoding is fitted on the specified dataset and can be applied to new data using the transform method.

Parameters:
  • features (str or tuple, default=None) – Feature names to be encoded. If None, all categorical features in the dataset will be automatically selected for encoding.

  • dataset ({"main", "train", "test"}, default="main") – Specifies which dataset partition to use for generating the binning boundaries.

  • method ({"ordinal", "onehot", "target"}, default="ordinal") –

    Encoding method to use:

    • ”ordinal”: Converts categories to integer values

    • ”onehot”: Creates binary columns for each category (minus one reference category)

    • ”target”: Replaces categories with the mean of the target variable for each category

  • target (str, optional, default=None) – The name of the target variable to use for target encoding. Required if method is “target”.

Returns:

A container object with the following components:

  • key: “data_preprocess_encoding”

  • data: Name of the dataset used

  • inputs: Dictionary of input parameters used

  • value: Dictionary containing encoder configurations for each feature:

    • ”fidx”: Feature index

    • ”encoder”: Fitted encoder object

    • ”feature_names_out”: List of output feature names

Return type:

ValidationResult

Examples

Data Processing and Feature Engineering

Data Processing and Feature Engineering