modeva.DataSet.eda_2d#

DataSet.eda_2d(feature_x: str, feature_y: str, feature_color: str = None, dataset: str = 'main', sample_size: int = None, smoother_order: int = None, random_state: int = 0)#

Creates a bivariate visualization between two features with optional color encoding.

Generates various types of plots (scatter, box, or bar) based on the feature types (numerical or categorical) to visualize the relationship between two features. The visualization can be enhanced with a third feature through color coding, and includes options for data sampling and smoothing.

Parameters:

feature_x (str) – Name of the feature to be plotted on the x-axis.
feature_y (str) – Name of the feature to be plotted on the y-axis.
feature_color (str, default=None) – Name of the feature used for color encoding in scatter plots. If None, no color encoding is applied.
dataset ({"main", "train", "test"}, default="main") – Specifies which dataset partition to use for visualization.
sample_method ({"random"}, default='random') – Method used for data sampling. Currently only supports random sampling.
sample_size (int, default=None) – Maximum number of samples to use in visualization. If None, uses all available data.
smoother_order (int, default=None) – Order of polynomial for trend line smoothing. If None, no smoothing curve is drawn.
random_state (int, default=0) – Random seed for reproducible sampling.

Returns:

A container object with the following components:

key: “data_eda_2d”
data: Name of the dataset used
inputs: Dictionary of input parameters
options: Dictionary of visualizations configuration for a scatter / box / stacked bar plot. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Exploratory Data Analysis