modeva.DataSet.feature_select_xgbpfi#

DataSet.feature_select_xgbpfi(dataset: str = 'train', n_repeats: int = 10, threshold: float = 0.1, random_state: int = 0)#

Selects important features using XGBoost model and permutation importance analysis.

This function trains an XGBoost model and evaluates feature importance through permutation analysis. Features are selected based on their normalized importance scores exceeding a specified threshold. The analysis includes visualization of feature importance scores with a threshold line.

Parameters:
  • dataset ({"main", "train", "test"}, default="train") – Dataset partition to analyze. “main” uses full dataset, “train”/”test” use respective splits.

  • n_repeats (int, default=10) – Number of permutation iterations per feature. Higher values give more stable importance scores but increase computation time.

  • threshold (float, default=0.1) – Minimum normalized importance score for feature selection. Features with scores above this threshold are selected.

  • random_state (int, default=0) – Random seed for reproducible permutation results.

Returns:

A container object with the following components:

  • key: “data_fs_xgbpfi”

  • data: Name of the dataset used

  • inputs: Input parameters used for the analysis

  • value: Dictionary containing:

    • ”selected”: List of selected feature names

  • table: DataFrame with feature names, importance scores, and selection status

  • options: Dictionary of visualizations configuration for a horizontal bar plot of feature importance. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Feature Selection

Feature Selection