Neuroprobe2025#

class brainsets.datasets.Neuroprobe2025(root=None, recording_ids=None, transform=None, *, subset_tier=None, test_subject=None, test_session=None, split=None, label_mode=None, task=None, regime=None, fold=None, uniquify_channel_ids_with_subject=True, uniquify_channel_ids_with_session=False, dirname='neuroprobe_2025', **kwargs)[source]#

Bases: MultiChannelDatasetMixin, Dataset

Neuroprobe 2025 iEEG benchmark dataset.

Preprocessing

To download and prepare this dataset, run

brainsets prepare neuroprobe_2025

Each instance operates in exactly one of two mutually-exclusive modes:

  • Neuroprobe benchmark mode (recording_ids=None): splits are resolved from Neuroprobe benchmark split generators. Cross-session and Cross-subject are condensed to ‘cross-x’ splits that will be selected for train and test.

  • Recording id mode (recording_ids provided): no splits are resolved, only recording_ids specified are preprocessed to be used as continuous data.

References

Zahorodnii, A., Wang, C., Stankovits, B., Moraitaki, C., Chau, G., Barbu, A., Katz, B., & Fiete, I. R. Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli. arXiv:2509.21671.

Data sources: BrainTreeBank and Neuroprobe Benchmark

Parameters:
  • root (Optional[str]) – Root directory containing processed Neuroprobe artifacts. Defaults to processed_dir from brainsets config.

  • recording_ids (Optional[list[str]]) – Optional explicit recording-id subset to expose from disk. If omitted, the dataset uses benchmark-required recording ids inferred from subset_tier/test_subject/test_session/split/label_mode/task/regime/fold.

  • transform (Optional[Callable]) – Optional sample transform.

  • subset_tier (Optional[Literal['full', 'lite', 'nano']]) – One of "full", "lite", "nano". Required in benchmark mode; must be omitted in explicit-recording mode.

  • test_subject (int | None) – Target test subject id (Neuroprobe semantics). Required in benchmark mode; must be omitted in explicit-recording mode.

  • test_session (int | None) – Target test trial/session id (Neuroprobe semantics). Required in benchmark mode; must be omitted in explicit-recording mode.

  • split (Optional[Literal['train', 'val', 'test']]) – One of "train", "val", "test". Required in benchmark mode; must be omitted in explicit-recording mode.

  • label_mode (Optional[Literal['binary', 'multiclass']]) – One of "binary", "multiclass". Defaults to "binary" in benchmark mode.

  • task (str | None) – Neuroprobe task name. Defaults to "speech" in benchmark mode. Supported values are: "delta_volume", "face_num", "frame_brightness", "global_flow", "gpt2_surprisal", "local_flow", "onset", "pitch", "speech", "volume", "word_gap", "word_head_pos", "word_index", "word_length", "word_part_speech".

  • regime (Optional[Literal['SS-SM', 'SS-DM', 'DS-DM']]) – One of "SS-SM", "SS-DM", "DS-DM". Defaults to "SS-SM" in benchmark mode. Neuroprobe regime semantics: - "SS-SM": single-subject, single-session (within-session split) - "SS-DM": single-subject, different-session (cross-x split) - "DS-DM": different-subject, different-session (cross-x split)

  • fold (int | None) – Fold index used only in benchmark mode. Defaults to 0 in benchmark mode and must be omitted in explicit-recording mode. Valid values depend on regime: - within_session: valid {0, 1} - cross_x: forced to 0

  • uniquify_channel_ids_with_subject (bool) – Whether to prefix channel IDs with subject.id via MultiChannelDatasetMixin. Defaults to True.

  • uniquify_channel_ids_with_session (bool) – Whether to prefix channel IDs with session.id via MultiChannelDatasetMixin. Defaults to False.

  • dirname (str) – Subdirectory under root containing recording H5 files.