generate_string_kfold_assignment¶
- brainsets.utils.split.generate_string_kfold_assignment(string_id, n_folds=3, val_ratio=0.2, seed=42)[source]¶
Generate deterministic per-fold train/valid/test assignments for one ID.
The assignment is independent for each fold index
k, but follows a deterministic two-step rule:Compute a global bucket from
md5(f"{string_id}_{seed}") % n_folds. The fold whose index equals this bucket is labeled"test".For every other fold, compute a fold-specific hash
md5(f"{string_id}_{seed}_{k}")and map it to[0, 1). If that value is belowval_ratio, the fold is"valid", otherwise it is"train".
As a result, each
string_idappears in the test split for exactly one fold and is never in test for the remaining folds. This makes the output reproducible across runs and safe for parallel processing.- Parameters:
string_id (str) – String identifier (e.g., “S001”, “sub-01”, or “sub-01_ses-01”).
n_folds (int) – Number of folds for cross-validation. Default is 3.
val_ratio (float) – Ratio of validation set relative to train+valid combined. Default is 0.2.
seed (int) – Random seed for reproducibility. Default is 42.
- Returns:
List of fold assignments where index
kcorresponds to foldkand each value is one of"train","valid", or"test". Exactly one entry is"test".- Return type:
List[str]
Examples
>>> assignments = generate_string_kfold_assignment("sub-01", n_folds=3) >>> assignments ['train', 'test', 'train']
>>> generate_string_kfold_assignment("sub-01_ses-01", n_folds=3) ['valid', 'train', 'test']