modeva.DataSet.detect_outlier_isolation_forest#

DataSet.detect_outlier_isolation_forest(dataset: str = 'main', threshold: float = 0.99, n_estimators: int = 100)#

Performs outlier detection using the Isolation Forest algorithm.

This method implements outlier detection by training an Isolation Forest model on the specified dataset. It preprocesses the data, fits the model, calculates outlier scores, and returns both the detection results and visualization options. The method uses score thresholding based on quantiles to identify outliers.

Parameters:
  • dataset ({"main", "train", "test"}, default="main") – Specifies which dataset partition to analyze for outliers.

  • threshold (float, default=0.99) – Quantile threshold for outlier classification. Values between 0 and 1, where higher values result in fewer outliers being identified.

  • n_estimators (int, default=100) – Number of trees in the Isolation Forest ensemble. Higher values generally provide better stability but increase computation time.

Returns:

A container object with the following components:

  • key: “data_outlier_isolationforest”

  • data: Name of the dataset used

  • inputs: Dictionary of input parameters

  • table: Dictionary containing:

    • outliers: DataFrame of identified outlier samples

    • non-outliers: DataFrame of normal samples

  • func: Callable function that computes outlier scores for new data

  • options: Dictionary of visualizations configuration for a histogram plot where x-axis is the outlier scores, and y-axis is the density. Run results.plot() to show this plot.

Return type:

ValidationResult

Examples

Outlier Detection

Outlier Detection