Project Report
1. Data Collection and Preprocessing
In this section, we describe the process of gathering and preparing the dataset for our machine learning project.
1.1 Data Reading and Loading
We begin by reading and loading the preprocessed EEG data from the dataset files using the provided Python function read_data(filename)
. This function utilizes the pickle
library to unpickle the data from each file:
pythonCopy codeimport pickle
import numpy as np
def read_data(filename):
x = pickle._Unpickler(open(filename, 'rb'))
x.encoding = 'latin1'
data = x.load()
return data
# List of files to be read
files = []
for n in range(1, 33):
s = ''
if n < 10:
s += '0'
s += str(n)
files.append(s)
# Initialize lists to store labels and data
labels = []
data = []
# Read and load data from each file
for i in files:
fileph = "data_preprocessed_python/s" + i + ".dat"
d = read_data(fileph)
labels.append(d['labels'])
data.append(d['data'])
In this code snippet, we iterate through each file, read its contents using the read_data
function, and append the labels and data to separate lists.
1.2 Data Structure
After reading the data, we have obtained labels and corresponding EEG data for each sample. The shape of the labels and data arrays before reshaping are as follows:
pythonCopy codeprint(np.array(labels).shape) # (32, 40, 4)
print(np.array(data).shape) # (32, 40, 40, 8064)
These shapes indicate that we have 32 files, each containing 40 samples. Each sample consists of 40 channels of EEG data with 8064 time steps.
1.3 Data Reshaping
To facilitate further processing, we reshape the labels and data arrays to combine all samples into a single dimension:
pythonCopy codelabels = np.array(labels).reshape(1280, 4)
data = np.array(data).reshape(1280, 40, 8064)
After reshaping, the shapes of the labels and data arrays are:
pythonCopy codeprint(labels.shape) # (1280, 4)
print(data.shape) # (1280, 40, 8064)
1.4 EEG Data Exploration
Before proceeding with the analysis, let's explore the EEG data to gain insights into its characteristics and distribution.
Great! Let's add the code snippet and corresponding explanation to the report:
1.5 Data Array Transformation
After reshaping, we convert the lists of labels and data into NumPy arrays for easier manipulation:
pythonCopy codeimport numpy as np
# Convert lists to NumPy arrays
labels = np.array(labels)
data = np.array(data)
# Print shapes of label and data arrays
print(labels.shape) # (1280, 4)
print(data.shape) # (1280, 40, 8064)
This code snippet converts the lists of labels and data into NumPy arrays, enabling us to perform array operations and computations efficiently. The shapes of the label and data arrays after conversion indicate that we have 1280 samples, each with 4 labels and 40 channels of EEG data, respectively.