generate_stratified_folds#
- brainsets.utils.split.generate_stratified_folds(intervals, stratify_by, n_folds=5, val_ratio=0.2, seed=42)[source]#
Generates stratified train/valid/test splits using a two-stage splitting process.
- The splitting is performed in two stages:
Outer split (StratifiedKFold): The intervals are divided into n_folds, where each fold uses one partition as the test set and the remaining partitions as train+valid. Stratification ensures each fold maintains the class distribution of the original data.
Inner split (StratifiedShuffleSplit): The train+valid portion of each fold is further split into train and valid sets using val_ratio, while preserving the class distribution.
- Parameters:
intervals (
Interval) – The intervals to split.n_folds (
int) – Number of folds for cross-validation.val_ratio (
float) – Ratio of validation set relative to train+valid combined.seed (
int) – Random seed.stratify_by (
str) – The attribute name to use for stratification (e.g., “id”, “label”, “class”). The intervals must have this attribute.
- Return type:
- Returns:
List of Data objects, one for each fold.
- Raises:
ValueError – If the intervals don’t have the specified stratify_by attribute.
ValueError – If there are fewer samples than n_folds.