Project Report

1. Data Collection and Preprocessing

In this section, we describe the process of gathering and preparing the dataset for our machine learning project.

1.1 Data Reading and Loading

We begin by reading and loading the preprocessed EEG data from the dataset files using the provided Python function read_data(filename). This function utilizes the pickle library to unpickle the data from each file:

pythonCopy codeimport pickle
import numpy as np

def read_data(filename):
    x = pickle._Unpickler(open(filename, 'rb'))
    x.encoding = 'latin1'
    data = x.load()
    return data

# List of files to be read
files = []
for n in range(1, 33): 
    s = ''
    if n < 10:
        s += '0'
    s += str(n)
    files.append(s)

# Initialize lists to store labels and data
labels = []
data = []

# Read and load data from each file
for i in files: 
    fileph = "data_preprocessed_python/s" + i + ".dat"
    d = read_data(fileph)
    labels.append(d['labels'])
    data.append(d['data'])

In this code snippet, we iterate through each file, read its contents using the read_data function, and append the labels and data to separate lists.

1.2 Data Structure

After reading the data, we have obtained labels and corresponding EEG data for each sample. The shape of the labels and data arrays before reshaping are as follows:

pythonCopy codeprint(np.array(labels).shape)  # (32, 40, 4)
print(np.array(data).shape)    # (32, 40, 40, 8064)

These shapes indicate that we have 32 files, each containing 40 samples. Each sample consists of 40 channels of EEG data with 8064 time steps.

1.3 Data Reshaping

To facilitate further processing, we reshape the labels and data arrays to combine all samples into a single dimension:

pythonCopy codelabels = np.array(labels).reshape(1280, 4)
data = np.array(data).reshape(1280, 40, 8064)

After reshaping, the shapes of the labels and data arrays are:

pythonCopy codeprint(labels.shape)  # (1280, 4)
print(data.shape)    # (1280, 40, 8064)

1.4 EEG Data Exploration

Before proceeding with the analysis, let's explore the EEG data to gain insights into its characteristics and distribution.

Great! Let's add the code snippet and corresponding explanation to the report:

1.5 Data Array Transformation

After reshaping, we convert the lists of labels and data into NumPy arrays for easier manipulation:

pythonCopy codeimport numpy as np

# Convert lists to NumPy arrays
labels = np.array(labels)
data = np.array(data)

# Print shapes of label and data arrays
print(labels.shape)  # (1280, 4)
print(data.shape)    # (1280, 40, 8064)

This code snippet converts the lists of labels and data into NumPy arrays, enabling us to perform array operations and computations efficiently. The shapes of the label and data arrays after conversion indicate that we have 1280 samples, each with 4 labels and 40 channels of EEG data, respectively.