Neuroprobe2025#

class brainsets.datasets.Neuroprobe2025(root=None, recording_ids=None, transform=None, *, subset_tier=None, test_subject=None, test_session=None, split=None, label_mode=None, task=None, regime=None, fold=None, uniquify_channel_ids_with_subject=True, uniquify_channel_ids_with_session=False, dirname='neuroprobe_2025', **kwargs)[source]#

Bases: MultiChannelDatasetMixin, Dataset

Neuroprobe 2025 iEEG benchmark dataset.

Preprocessing

To download and prepare this dataset, run

brainsets prepare neuroprobe_2025

Each instance operates in exactly one of two mutually-exclusive modes:

Neuroprobe benchmark mode (recording_ids=None): splits are resolved from Neuroprobe benchmark split generators. Cross-session and Cross-subject are condensed to ‘cross-x’ splits that will be selected for train and test.
Recording id mode (recording_ids provided): no splits are resolved, only recording_ids specified are preprocessed to be used as continuous data.

References

Zahorodnii, A., Wang, C., Stankovits, B., Moraitaki, C., Chau, G., Barbu, A., Katz, B., & Fiete, I. R. Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli. arXiv:2509.21671.

Data sources: BrainTreeBank and Neuroprobe Benchmark

Parameters:

root (Optional[str]) – Root directory containing processed Neuroprobe artifacts. Defaults to processed_dir from brainsets config.
recording_ids (Optional[list[str]]) – Optional explicit recording-id subset to expose from disk. If omitted, the dataset uses benchmark-required recording ids inferred from subset_tier/test_subject/test_session/split/label_mode/task/regime/fold.
transform (Optional[Callable]) – Optional sample transform.
subset_tier (Optional[Literal['full', 'lite', 'nano']]) – One of "full", "lite", "nano". Required in benchmark mode; must be omitted in explicit-recording mode.
test_subject (int | None) – Target test subject id (Neuroprobe semantics). Required in benchmark mode; must be omitted in explicit-recording mode.
test_session (int | None) – Target test trial/session id (Neuroprobe semantics). Required in benchmark mode; must be omitted in explicit-recording mode.
split (Optional[Literal['train', 'val', 'test']]) – One of "train", "val", "test". Required in benchmark mode; must be omitted in explicit-recording mode.
label_mode (Optional[Literal['binary', 'multiclass']]) – One of "binary", "multiclass". Defaults to "binary" in benchmark mode.
task (str | None) – Neuroprobe task name. Defaults to "speech" in benchmark mode. Supported values are: "delta_volume", "face_num", "frame_brightness", "global_flow", "gpt2_surprisal", "local_flow", "onset", "pitch", "speech", "volume", "word_gap", "word_head_pos", "word_index", "word_length", "word_part_speech".
regime (Optional[Literal['SS-SM', 'SS-DM', 'DS-DM']]) – One of "SS-SM", "SS-DM", "DS-DM". Defaults to "SS-SM" in benchmark mode. Neuroprobe regime semantics: - "SS-SM": single-subject, single-session (within-session split) - "SS-DM": single-subject, different-session (cross-x split) - "DS-DM": different-subject, different-session (cross-x split)
fold (int | None) – Fold index used only in benchmark mode. Defaults to 0 in benchmark mode and must be omitted in explicit-recording mode. Valid values depend on regime: - within_session: valid {0, 1} - cross_x: forced to 0
uniquify_channel_ids_with_subject (bool) – Whether to prefix channel IDs with subject.id via MultiChannelDatasetMixin. Defaults to True.
uniquify_channel_ids_with_session (bool) – Whether to prefix channel IDs with session.id via MultiChannelDatasetMixin. Defaults to False.
dirname (str) – Subdirectory under root containing recording H5 files.