Neuroprobe2025#
- class brainsets.datasets.Neuroprobe2025(root=None, recording_ids=None, transform=None, *, subset_tier=None, test_subject=None, test_session=None, split=None, label_mode=None, task=None, regime=None, fold=None, uniquify_channel_ids_with_subject=True, uniquify_channel_ids_with_session=False, dirname='neuroprobe_2025', **kwargs)[source]#
Bases:
MultiChannelDatasetMixin,DatasetNeuroprobe 2025 iEEG benchmark dataset.
Preprocessing
To download and prepare this dataset, run
brainsets prepare neuroprobe_2025
Each instance operates in exactly one of two mutually-exclusive modes:
Neuroprobe benchmark mode (recording_ids=None): splits are resolved from Neuroprobe benchmark split generators. Cross-session and Cross-subject are condensed to ‘cross-x’ splits that will be selected for train and test.
Recording id mode (recording_ids provided): no splits are resolved, only recording_ids specified are preprocessed to be used as continuous data.
References
Zahorodnii, A., Wang, C., Stankovits, B., Moraitaki, C., Chau, G., Barbu, A., Katz, B., & Fiete, I. R. Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli. arXiv:2509.21671.
Data sources: BrainTreeBank and Neuroprobe Benchmark
- Parameters:
root (
Optional[str]) – Root directory containing processed Neuroprobe artifacts. Defaults toprocessed_dirfrom brainsets config.recording_ids (
Optional[list[str]]) – Optional explicit recording-id subset to expose from disk. If omitted, the dataset uses benchmark-required recording ids inferred fromsubset_tier/test_subject/test_session/split/label_mode/task/regime/fold.subset_tier (
Optional[Literal['full','lite','nano']]) – One of"full","lite","nano". Required in benchmark mode; must be omitted in explicit-recording mode.test_subject (
int|None) – Target test subject id (Neuroprobe semantics). Required in benchmark mode; must be omitted in explicit-recording mode.test_session (
int|None) – Target test trial/session id (Neuroprobe semantics). Required in benchmark mode; must be omitted in explicit-recording mode.split (
Optional[Literal['train','val','test']]) – One of"train","val","test". Required in benchmark mode; must be omitted in explicit-recording mode.label_mode (
Optional[Literal['binary','multiclass']]) – One of"binary","multiclass". Defaults to"binary"in benchmark mode.task (
str|None) – Neuroprobe task name. Defaults to"speech"in benchmark mode. Supported values are:"delta_volume","face_num","frame_brightness","global_flow","gpt2_surprisal","local_flow","onset","pitch","speech","volume","word_gap","word_head_pos","word_index","word_length","word_part_speech".regime (
Optional[Literal['SS-SM','SS-DM','DS-DM']]) – One of"SS-SM","SS-DM","DS-DM". Defaults to"SS-SM"in benchmark mode. Neuroprobe regime semantics: -"SS-SM": single-subject, single-session (within-session split) -"SS-DM": single-subject, different-session (cross-x split) -"DS-DM": different-subject, different-session (cross-x split)fold (
int|None) – Fold index used only in benchmark mode. Defaults to0in benchmark mode and must be omitted in explicit-recording mode. Valid values depend on regime: -within_session: valid {0, 1} -cross_x: forced to 0uniquify_channel_ids_with_subject (
bool) – Whether to prefix channel IDs withsubject.idviaMultiChannelDatasetMixin. Defaults toTrue.uniquify_channel_ids_with_session (
bool) – Whether to prefix channel IDs withsession.idviaMultiChannelDatasetMixin. Defaults toFalse.dirname (
str) – Subdirectory underrootcontaining recording H5 files.