modeva.DataSet.feature_select_rcit#

DataSet.feature_select_rcit(dataset: str = 'train', threshold: float = 1e-06, n_fourier: int = 25, n_fourier2: int = 5, n_forwards: int = 2, random_state: int = 0)#

Performs feature selection using RCIT and FBEDk to identify important features based on conditional independence testing.

This method implements a two-stage feature selection process that combines Randomized Conditional Independence Test (RCIT) with Forward-Backward-Elimination with Early Dropping (FBEDk). It first performs forward selection to identify potentially important features, followed by backward elimination to remove redundant ones, using random Fourier features for non-parametric conditional independence testing.

Parameters:
  • dataset ({"main", "train", "test"}, default="train") – Dataset partition to use for feature selection.

  • threshold (float, default=1e-6) – P-value threshold for feature significance; features with p-values below this are retained.

  • n_fourier (int, default=25) – Number of random Fourier features for conditioning set in RCIT test.

  • n_fourier2 (int, default=5) – Number of random Fourier features for non-conditioning set in RCIT test.

  • n_forwards (int, default=2) – Number of forward selection iterations in FBEDk algorithm.

  • random_state (int, default=0) – Seed for random number generation to ensure reproducibility.

Returns:

A container object with the following components:

  • key: “data_fs_rcit”

  • data: Name of the dataset used

  • inputs: Dictionary of input parameters

  • value: Dictionary containing:

    • ”step_logs”: List of selection step information

    • ”selected”: List of selected feature names

  • table: DataFrame showing step-by-step feature selection status

  • options: Dictionary of visualizations configuration for a feature selection heatmap plot of feature importance. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Feature Selection

Feature Selection