OpenNeuroPipeline#

class brainsets.utils.openneuro.OpenNeuroPipeline(raw_dir, processed_dir, args, tracker_handle=None, download_only=False)[source]#

Bases: BrainsetPipeline, ABC

Abstract base class for OpenNeuro dataset pipelines.

This class provides foundational tools and conventions for preprocessing and handling OpenNeuro datasets within the Brainsets framework. It is designed to be subclassed for specific datasets and supports both EEG and iEEG modalities.

Attributes (to be defined by subclasses):
  • dataset_id: Identifier for the OpenNeuro dataset (e.g., “ds005555”).

  • brainset_id: Unique local identifier for the brainset.

  • origin_version: Version string corresponding to the raw source dataset.

  • derived_version: Version or tag indicating the processing version of the derived data.

  • description: Optional textual description of the dataset.

  • modality: Data modality for this pipeline. Must be overridden by subclasses.

Customization points:
This class supports and encourages dataset-specific customizations via:
These can be set as class attributes or managed dynamically by overriding the following methods:

The process_common() method implements the standard steps and routines shared by all OpenNeuro datasets. This provides a consistent entry point for all dataset processing. Subclasses may extend or override the process() method to implement dataset-specific processing logic.

Documentation can be found in the official brainsets docs: See [Creating an OpenNeuro Pipeline](https://brainsets.readthedocs.io/en/latest/concepts/openneuro_pipeline.html) for the complete guide on building OpenNeuro pipelines.

parser: ArgumentParser | None = ArgumentParser(prog='__main__.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)#

Argument parser for common OpenNeuro pipeline flags.

modality: Literal['eeg', 'ieeg']#

Data modality for this pipeline. Must be overridden by subclasses.

dataset_id: str#

OpenNeuro dataset identifier (e.g., “ds005555”, “ds006914”).

brainset_id: str#

Unique identifier for the brainset.

origin_version: str#

Version of the original data. Must be specified by the author of each pipeline.

derived_version: str#

Version of the processed data. Must be specified by the author of each pipeline.

description: str | None = None#

Optional description of the dataset.

CHANNEL_NAME_REMAPPING: dict[str, str] | None = None#

Optional dict mapping original channel name to new standardized name.

For more complex configurations (e.g., per-recording mappings), override get_channel_name_remapping() instead.

TYPE_CHANNELS_REMAPPING: dict[str, list[str]] | None = None#

Optional dict mapping channel types to lists of channel names.

For more complex configurations (e.g., per-recording mappings), override get_type_channels_remapping() instead.

IGNORE_CHANNELS: list[str] | None = None#

Optional list of channel names to ignore.

Channel names should be specified as they appear in the original namespace of the raw object (i.e., prior to any remapping or type changes).

static validate_dataset_id(dataset_id)[source]#

Validate OpenNeuro dataset identifier format.

OpenNeuro dataset IDs follow the format ‘ds’ followed by exactly 6 digits, where the numeric portion ranges from 000001 to 009999.

Parameters:

dataset_id (str) – The dataset identifier in strict format: - Must be lowercase ‘ds’ followed by exactly 6 digits. - Numeric portion must be between 000001 and 009999.

Raises:

ValueError – If the dataset ID format is invalid, does not match strict format, or the numeric part is outside the valid range.

Return type:

None

classmethod get_manifest(raw_dir, args)[source]#

Generate a manifest DataFrame by discovering recordings from OpenNeuro.

This implementation queries OpenNeuro S3 and parses BIDS-compliant filenames to discover recordings for the pipeline modality.

Parameters:
  • raw_dir (Path) – Raw data directory assigned to this brainset

  • args (Optional[Namespace]) – Pipeline-specific arguments parsed from the command line

Returns:

  • subject_id: Subject identifier (e.g., ‘sub-01’)

  • recording_id: Recording identifier (index)

  • s3_url: S3 URL for downloading

Return type:

DataFrame with columns

download(manifest_item)[source]#

Download data for a single recording from OpenNeuro S3.

Parameters:

manifest_item – A single row of the manifest

Return type:

Series

Returns:

Series containing subject_id, recording_id, s3_url, latest_snapshot_tag, age, sex, and species.

process_common(download_output)[source]#

Process data files and create a Data object.

This method handles common OpenNeuro processing tasks: 1. Loads BIDS-structured data files using MNE-BIDS 2. Extracts metadata (subject, session, device, brainset descriptions) 3. Extracts signal and channel information 5. Creates a Data object

Parameters:

download_output (Series) – Series returned by download()

Return type:

Optional[tuple[Data, Path]]

Returns:

Tuple of (data, store_path), or None if processing is skipped.

process(download_output)[source]#

Process and save the dataset.

Default implementation calls _process_common() and persists the result. Subclasses can override to add dataset-specific processing.

Parameters:

download_output (Series) – Series returned by download()

Return type:

None

get_channel_name_remapping(recording_id=None)[source]#

Return channel name remapping for a given recording.

Override this method to provide per-recording channel name remappings. The default implementation returns the class-level CHANNEL_NAME_REMAPPING attribute.

Parameters:

recording_id (str | None) – The recording identifier

Return type:

Optional[dict[str, str]]

Returns:

Mapping from original channel names to standardized names, or None.

get_type_channels_remapping(recording_id=None)[source]#

Return channel type remapping for a given recording.

Override this method to provide per-recording channel type remappings. The default implementation returns the class-level TYPE_CHANNELS_REMAPPING attribute.

Parameters:

recording_id (str | None) – The recording identifier

Return type:

Optional[dict[str, list[str]]]

Returns:

Mapping from channel type to channel name list, or None.