EDA Multivariate#

The EDA Multivariate panel enables users to perform exploratory data analysis (EDA) on multiple features simultaneously. It provides correlation analysis, PCA (Principal Component Analysis), and UMAP (Uniform Manifold Approximation and Projection) for dimensionality reduction and pattern discovery.

Initialize the Panel#

To create and initialize the EDA Multivariate panel, use:

# Load the Experiment and view the multivariate analysis
from modeva import Experiment
exp = Experiment(name='Demo-SimuCredit')
exp.eda_multivariate()

Workflow#

Step 1: Load and Select Dataset#

  1. Select a dataset from the Dataset Selection dropdown.

  2. Choose a data split (e.g., main, train, test) to analyze specific subsets.

Step 2: Perform Correlation Analysis#

  1. Click the Correlation tab.

  2. Displays a correlation heatmap to detect relationships between numerical variables for feature selection or dimensionality reduction.

../../../_images/lowcode_edamulti_corr.png

Step 3: Apply Principal Component Analysis (PCA)#

This tab displays the PCA loading plot and 3D scatter of principal components for visualization. PCA helps reduce the dimensionality of the dataset while preserving the most important information. This step is useful for identifying clusters and patterns in the data.

  1. Click the PCA tab.

  2. Set Number of Components (``n_components``) (default = 5) and select X, Y, and Z coordinates from the principal components (e.g., PC1, PC2, PC3).

../../../_images/lowcode_edamulti_pca.png
  1. (Optional) Add a color feature for subgroup highlighting.

  2. (Optional) Adjust Sampling Method and Sampling Proportion for performance optimization.

../../../_images/lowcode_edamulti_pca_color.png

Step 4: Apply UMAP for Advanced Visualization#

This tab displays the UMAP scatter plot for advanced visualization of complex data structures. UMAP is a nonlinear dimensionality reduction technique that helps visualize complex patterns in the dataset.

  1. Click the UMAP tab.

  2. Set Number of Components (``n_components``) (default = 5), adjust Number of Neighbors (``n_neighbors``) for clustering sensitivity and select X, Y, and Z coordinates (e.g., C1, C2, C3).

../../../_images/lowcode_edamulti_umap.png
  1. (Optional) Add a color feature to highlight subgroups.

  2. (Optional) Adjust Sampling Method and Proportion if necessary.

../../../_images/lowcode_edamulti_umap_color.png

Step 5: Save and Export Results#

  • Click the register_icon to save plots for future reference.

../../../_images/lowcode_test_registry.png

Troubleshooting#

  • Slow Rendering: Too many data points. Reduce the sampling proportion.

  • UMAP Takes Too Long: High n_neighbors or large dataset. Reduce the number of neighbors or apply sampling.

The EDA Multivariate panel streamlines multivariate analysis with interactive visualizations, making it easier to detect patterns, relationships, and clusters within your dataset. For more information, refer to the Exploratory Data Analysis.