Loading and Exploring CBCIC dataset using bciflow

The bciflow library provides convenient tools for working with EEG datasets for Brain-Computer Interface (BCI) research. In this tutorial, we will focus on loading and exploring the CBCIC dataset using bciflow.

Objectives of this Tutorial

  • Learn how to load EEG data from CBCIC dataset using bciflow

  • Understand the structure of the dataset

  • Print and interpret key dataset components such as EEG signals, labels, and metadata

1. Installation

First, make sure bciflow is installed in your Python environment:
pip install bciflow

Note

Ensure you are using Python 3.7 or higher.

2. Loading the Dataset

We’ll use the CBCIC dataset for this tutorial. This is the dataset for the competition “Clinical Brain Computer Interfaces Challenge” to be held at WCCI 2020 at Glasgow. The dataset contains data from 10 hemiparetic stroke patients who are impaired either by left or right hand finger mobility.
Make sure the dataset files are saved in a known folder.
Now, let’s load the data for subject 1:
from bciflow.datasets.CBCIC import cbcic

dataset = cbcic(subject=1, path='data/cbcic/')

Note

This command loads the dataset for subject 1 and stores it in a dictionary called dataset.

Ensure the dataset is available at data/cbcic/ or adjust the path accordingly.

3. Exploring the Dataset Contents

Let’s explore what’s inside this dataset. We will print different keys of the dictionary to understand the data structure.

3.1 EEG Signals: dataset[“X”]

print(dataset["X"])

This prints the EEG signals organized as a 4D array:

  • trials: how many repetitions (epochs) of the task were recorded

  • frequency_bands: for each trial, the signals are filtered in different frequency bands (if applicable)

  • channels: each electrode in the EEG cap used

  • time_samples: the EEG signal over time (in samples)

Example shape: (120, 1, 12, 4096) → 120 trials, 1 frequency band, 12 electrodes, 4096 time samples. If the frequency is 512Hz, it means that there are 4096 samples in 8 seconds

3.2 Labels per Trial: dataset[“y”]

print(dataset["y"])
This shows a list of integers representing the class (or task) performed in each trial.
Example: [0, 0, 0, ..., 1, 1, 1]
Each number corresponds to a mental task (like left hand, right hand, etc.)

3.3 Class Meaning: dataset[“y_dict”]

print(dataset["y_dict"])
This prints a dictionary mapping class numbers to their meaning
Output example: {'left-hand': 0, 'right-hand': 1}
This tells us what class 0 and 1 mean in dataset[“y”].

3.4 Events: dataset[“events”]

print(dataset["events"])
This shows a dictionary containing event timestamps~:
{'get_start': [0, 3],
 'beep_sound': [2],
 'cue': [3, 8],
 'task_exec': [3, 8]}

This tells us when each event happened (in seconds) during data collection. Useful to segment the signals around specific events

3.5 Channel Names: dataset[“ch_names”]

print(dataset["ch_names"])
This prints a list of EEG channel (electrode) names, e.g.
Example: ['F3', 'FC3', 'C3', 'CP3', 'P3', 'FCz', 'CPz', 'P4', 'FC4', 'C4', 'CP4', 'P4']
Each name represents a physical location on the EEG cap.

3.6 Sampling Frequency: dataset[“sfreq”]

print(dataset["sfreq"])

Returns the sampling frequency in Hz (e.g., 512.0). This tells us how many samples per second were recorded.

3.7 Start Time: dataset[“tmin”]

print(dataset["tmin"])
Shows the starting time in seconds relative to event markers (e.g., 0.0).
If it was -1 it would indicate that data starts 1 second before the event (useful for extracting pre-event baselines).

4. Dataset Structure Summary

Dataset Structure

Key

Description

Example

X

EEG data (trials × bands × channels × time)

shape (120, 1, 12, 4096)

y

Labels for each trial

[0, 0, 0, …]

y_dict

Class mapping

{‘left-hand’: 0, ‘right-hand’: 1}

events

Event timestamps

{‘get_start’: […]}

ch_names

Channel names

[‘F3’, ‘FC3’, ‘C3’, …]

sfreq

Sampling frequency (Hz)

512.0

tmin

Start time (seconds)

0.0

5. Complete Example Code

from bciflow.datasets.CBCIC import cbcic

dataset = cbcic(subject=1, path='data/cbcic/')

print("EEG signals shape:", dataset["X"].shape)
print("Labels:", dataset["y"])
print("Class dictionary:", dataset["y_dict"])
print("Events:", dataset["events"])
print("Channel names:", dataset["ch_names"])
print("Sampling frequency (Hz):", dataset["sfreq"])
print("Start time (s):", dataset["tmin"])