modeva.DataSet.bin_numerical#
- DataSet.bin_numerical(features: str | Tuple = None, dataset: str = 'main', method: str = 'uniform', bins: int | Dict = 10)#
Performs binning transformation on numerical features by discretizing continuous values into discrete bins.
This function transforms continuous numerical features into discrete bins using various binning strategies. The binning configuration is stored internally and can be used for both forward and inverse transformations. Note that this preprocessing step can only be called once; subsequent calls will overwrite previous configurations.
- Parameters:
features (str or tuple, default=None) – Names of features to be binned. If None, all numerical features in the dataset will be processed.
dataset ({"main", "train", "test"}, default="main") – Specifies which dataset partition to use for generating the binning boundaries.
method ({"uniform", "quantile", "precompute"}, default="uniform") –
Binning strategy to use:
”uniform”: Creates bins of equal width
”quantile”: Creates bins with equal number of samples
”precompute”: Uses manually specified bin boundaries
bins (int or dict, default=10) –
Controls binning granularity:
If int: Number of bins for numerical features. For “quantile”, this is the maximum number of bins. For “auto-xgb1”, this sets XGBoost’s max_bin parameter.
If dict: Manual bin specifications for each feature, only used with method=”precompute”. Format: {feature_name: array_of_bin_edges}. Example: {“X0”: [0.1, 0.5, 0.9]} Note: Cannot specify bins for categorical features.
- Returns:
A container object with the following components:
key: “data_preprocess_binning”
data: Name of the dataset used
inputs: Dictionary of input parameters
value: Dictionary containing binning configuration for each feature:
”fidx”: Feature index
”bins”: Bin boundaries
”feature_names_out”: Output feature names
- Return type:
Examples