generate_stratified_folds#

brainsets.utils.split.generate_stratified_folds(intervals, stratify_by, n_folds=5, val_ratio=0.2, seed=42)[source]#

Generates stratified train/valid/test splits using a two-stage splitting process.

The splitting is performed in two stages:
  1. Outer split (StratifiedKFold): The intervals are divided into n_folds, where each fold uses one partition as the test set and the remaining partitions as train+valid. Stratification ensures each fold maintains the class distribution of the original data.

  2. Inner split (StratifiedShuffleSplit): The train+valid portion of each fold is further split into train and valid sets using val_ratio, while preserving the class distribution.

Parameters:
  • intervals (Interval) – The intervals to split.

  • n_folds (int) – Number of folds for cross-validation.

  • val_ratio (float) – Ratio of validation set relative to train+valid combined.

  • seed (int) – Random seed.

  • stratify_by (str) – The attribute name to use for stratification (e.g., “id”, “label”, “class”). The intervals must have this attribute.

Return type:

List[Data]

Returns:

List of Data objects, one for each fold.

Raises:
  • ValueError – If the intervals don’t have the specified stratify_by attribute.

  • ValueError – If there are fewer samples than n_folds.