modeva.DataSet.eda_pca#

DataSet.eda_pca(features: Tuple | List = None, n_components: int = None, dataset: str = 'main', sample_size: int = None, categorical_encoding: str = 'ordinal', random_state: int = 0)#

Performs Principal Component Analysis (PCA) on specified features with preprocessing for both numerical and categorical variables.

This function handles the complete PCA workflow, including data preprocessing (standardization for numerical features and encoding for categorical features), PCA computation, and visualization of results through loadings and explained variance.

Parameters:

features (tuple, default=None) – Features to include in PCA analysis. If None, all available features are used.
n_components (int, default=None) – Number of principal components to compute. If None, equals the number of features.
dataset ({"main", "train", "test"}, default="main") – Specifies which dataset partition to use for the analysis.
sample_size (int, default=None) – Size of random subsample to use for computation. If None or if data size is smaller, uses full dataset.
categorical_encoding ({"ordinal", "onehot"}, default="ordinal") –
Method for encoding categorical variables:
- ”ordinal”: Converts categories to integer values
- ”onehot”: Creates binary columns for each category (minus one reference category)
random_state (int, default=0) – Seed for random operations, ensuring reproducibility.

Returns:

A container object with the following components:

key: “data_eda_pca”
data: Name of the dataset used
inputs: Dictionary of input parameters
table: DataFrame containing PCA transformed data
options: Dictionary of visualizations configuration for a PCA loadings and explained variance plot. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Exploratory Data Analysis