modeva.DataSet.feature_select_rcit#
- DataSet.feature_select_rcit(dataset: str = 'train', threshold: float = 1e-06, n_fourier: int = 25, n_fourier2: int = 5, n_forwards: int = 2, random_state: int = 0)#
Performs feature selection using RCIT and FBEDk to identify important features based on conditional independence testing.
This method implements a two-stage feature selection process that combines Randomized Conditional Independence Test (RCIT) with Forward-Backward-Elimination with Early Dropping (FBEDk). It first performs forward selection to identify potentially important features, followed by backward elimination to remove redundant ones, using random Fourier features for non-parametric conditional independence testing.
- Parameters:
dataset ({"main", "train", "test"}, default="train") – Dataset partition to use for feature selection.
threshold (float, default=1e-6) – P-value threshold for feature significance; features with p-values below this are retained.
n_fourier (int, default=25) – Number of random Fourier features for conditioning set in RCIT test.
n_fourier2 (int, default=5) – Number of random Fourier features for non-conditioning set in RCIT test.
n_forwards (int, default=2) – Number of forward selection iterations in FBEDk algorithm.
random_state (int, default=0) – Seed for random number generation to ensure reproducibility.
- Returns:
A container object with the following components:
key: “data_fs_rcit”
data: Name of the dataset used
inputs: Dictionary of input parameters
value: Dictionary containing:
”step_logs”: List of selection step information
”selected”: List of selected feature names
table: DataFrame showing step-by-step feature selection status
options: Dictionary of visualizations configuration for a feature selection heatmap plot of feature importance. Run results.plot() to show this plot.
- Return type:
Examples