modeva.DataSet.subsample_random#

DataSet.subsample_random(dataset: str = 'main', sample_size: [<class 'int'>, <class 'float'>] = 0.2, shuffle: bool = True, stratify: str = None, random_state: int = 0)#

Subsample data randomly.

This function performs random subsampling on the specified dataset, allowing for options such as shuffling and stratification.

Parameters:
  • dataset ({"main", "train", "test"}, default="main") – Specifies which dataset to use for subsampling. Options include “main”, “train”, or “test”.

  • sample_size (int or float, default = 10) – If it is a float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If it is an int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • shuffle (bool, default=True) – Indicates whether to shuffle the data before subsampling. If set to False, stratification cannot be applied.

  • stratify (str, default=None) – The name of the feature to use for stratified sampling. If provided, the data will be split in a stratified manner based on this feature.

  • random_state (int, default=0) – The seed used by the random number generator, ensuring reproducibility of the results.

Returns:

A container object with the following components:

  • key: “subsample_random”

  • data: Name of the dataset used

  • inputs: Input parameters

  • value: Dictionary containing:

    • ”sample_idx”: Indices of the sampled data points

Return type:

ValidationResult

Examples

Subsampling

Subsampling