added description of one-vs-rest classifier

c87a7072 · Franziska Oschmann · 456bc996 · c87a7072
Commit c87a7072 authored 5 years ago by Franziska Oschmann
--- a/09_eeg_use_case.ipynb
+++ b/09_eeg_use_case.ipynb
@@ -472,7 +472,9 @@
    "</div>\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
-    "    <i class=\"fa fa-info-circle\"></i>&nbsp; <strong>One-vs-rest classification</strong>  \n",
+    "    <i class=\"fa fa-info-circle\"></i>&nbsp; <strong>One-vs-rest classification</strong>\n",
+    "    <p> Multiclass classification can also be tranferred to multiple binary classification problems. One strategy is called One-vs-rest, where one classifier is trained per class. In our case this means that for each arm movement one classifier is trained by considering only the labels of the respective arm movement.\n",
+    "    </p>\n",
    "\n",
    "</div>"
   ]

 %% Cell type:code id: tags:
 ``` python
 import numpy as np
 import matplotlib.pyplot as plt
 ```
 %% Cell type:markdown id: tags:
 # Chapter 9: Use case - prediction of arm movements
 <center>
 <figure>
 <table><tr>
 <td> <img src="./images/eeg_cap.png" style="width: 400px;"/> </td>
 <td> <img src="./images/arm_movement.png" style="width: 400px;"/> </td>
 </tr></table>
 <figcaption>Setup of an EEG-experiment.</figcaption>
 </figure>
 </center>
 %% Cell type:markdown id: tags:
 <center>
 <figure>
    <img src="./images/eeg_electrode_numbering.jpg" width=35%/>
    <figcaption>Arrangement of electrodes on head.</figcaption>
 </figure>
 </center>
 %% Cell type:markdown id: tags:
 This data contains EEG recordings of one subject performing **grasp-and-lift (GAL)** trials.
 There is **1 subject** in total, **10 series** of trials for this subject, and approximately **30 trials** within each series. The number of trials varies for each series.
 For each **GAL**, you are tasked to detect 6 events:
 - HandStart
 - FirstDigitTouch
 - BothStartLoadPhase
 - LiftOff
 - Replace
 - BothReleased
 These events always occur in the same order. In this dataset, there are two files for the subject + series combination:
 the *_data.csv files contain the raw 32 channels EEG data (sampling rate 500Hz)
 the *_events.csv files contains the ground truth frame-wise labels for all events
 Detailed information about the data can be found here:
 Luciw MD, Jarocka E, Edin BB (2014) Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction. Scientific Data 1:140047. www.nature.com/articles/sdata201447
 *Description from https://www.kaggle.com/c/grasp-and-lift-eeg-detection/data*
 %% Cell type:markdown id: tags:
 <center>
 <figure>
    <img src="./images/eeg_signal_preprocessing.png" title="made at imgflip.com" width=75%/>
    <figcaption>Preprocessing steps for EEG-signals.</figcaption>
 </figure>
 </center>
 %% Cell type:markdown id: tags:
 ### Load data
 %% Cell type:markdown id: tags:
 The data can be found in: '/data/eeg_use_case' and contains:
 - 8 series of recorded EEG data
 - 8 series of events of arm movements
 Load the EEG data and the events:
 - combine all EEG series in one array (size: (total number of time series, number of channels))
 - combine all events in one array (size: (total number of time series, number of different arm movement))
 - pay attention to the order of the series
 %% Cell type:markdown id: tags:
 <div class="alert alert-block alert-warning">
    <i class="fa fa-info-circle"></i>&nbsp; <strong>Filter strings with the lambda-operator</strong>
     The lambda-operator allows to build hidden functions, which are basically functions without a name. These hidden      functions have any number of parameters, execute an expression and return the value of this expression. The lambda operator can be applied in the following way to filter the filenames:
     all_data_files = list(filter(lambda x: '_data' in x, os.listdir(path)))
 </div>
 %% Cell type:code id: tags:
 ``` python
 def load_data(file_names, path):
    # read the csv file and drop the id column
    dfs = []
    for f in file_names:
        df = pd.read_csv(path + f).drop('id', axis = 1)
        dfs.append(df)
    return dfs
 ```
 %% Cell type:code id: tags:
 ``` python
 # define path and list of all data and event files
 import os
 import pandas as pd
 path = 'data/eeg_use_case/'
 all_data_files = list(filter(lambda x: '_data' in x, os.listdir(path)))
 all_event_files = list(filter(lambda x: '_events' in x, os.listdir(path)))
 all_data_sort = np.sort(all_data_files)
 all_event_sort = np.sort(all_event_files)
 ```
 %% Cell type:code id: tags:
 ``` python
 # load all data and event files
 all_data = np.concatenate(load_data(all_data_sort, path))
 all_events = np.concatenate(load_data(all_event_sort, path))
 ```
 %% Cell type:markdown id: tags:
 ### Visualization
 %% Cell type:markdown id: tags:
 Visualize the EEG-data and events and pay attention to:
 - the EEG traces
 - the number of detected arm movements
 What do you observe?
 %% Cell type:code id: tags:
 ``` python
 cols = ['C0', 'C1', 'C2', 'C3']
 ix = np.arange(len(columns))[::8]
 columns = pd.read_csv(path + all_data_sort[0]).columns[1:]
 labels = columns[::8]
 plt.figure(figsize = (7,10))
 plt.subplots_adjust(hspace = 0.3)
 for i, ch in enumerate(ix):
    ax = plt.subplot(5,1,i+1)
    ax.plot(all_data[(start-500):(start+3500), ch], linewidth = 1.5, color = cols[i], label = labels[i])
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.set_yticks([])
    ax.set_xticks([])
    ax.legend(loc='upper left', bbox_to_anchor= (0, 1.1), fontsize = 14)
    ax.set_ylim(-500,3000)
 ax = plt.subplot(5,1,5)
 ax.spines['right'].set_visible(False)
 ax.spines['top'].set_visible(False)
 ax.spines['left'].set_visible(False)
 ax.set_yticks([])
 ax.set_xticks([])
 ax.plot(all_events[(start-500):(start+3500)], linewidth = 2)
 ax.set_xticks(np.arange(0,4100,1000))
 ax.set_xticklabels(['0', '2', '4', '6', '8'], fontsize = 14)
 ax.set_xlabel('Time [sec]', fontsize = 14)
 ax.set_ylim(0.1,1)
 lgd = ax.legend(['1', '2', '3', '4', '5', '6'],
               loc='lower left', bbox_to_anchor= (0.85, 0.1), ncol=2,
                borderaxespad=0, frameon=True, fontsize = 12)
 ```
 %% Output
 %% Cell type:code id: tags:
 ``` python
 plt.figure(figsize = (10,7))
 plt.subplots_adjust(wspace = 0.5)
 plt.subplots_adjust(hspace = 0.5)
 for i, e in enumerate(all_events.T):
    plt.subplot(2,3,i+1)
    plt.hist(e, [0, 0.5, 1, 1.5])
    plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14)
    plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14)
    plt.title('movement ' + str(i+1), fontsize = 14)
 ```
 %% Output
 %% Cell type:markdown id: tags:
 ### Feature extraction
 %% Cell type:markdown id: tags:
 The purpose of the feature extraction is to extract time-dependent features from the EEG data. To do so, a sliding window containing 500 datapoints each is used. Three consecutive time windows each predict the event in the following time step.
 Extract time-dependend features from the EEG-data:
 - define the start and end points of a sliding window with a length of 500 datapoints and a step size of 2
 - loop through those start and end points
 - per iteration:
    - take three consecutive time windows (window_1 = data[start:end,:], window_2 = data[start+500:end+500,:],
    window_3 = data[start+1000:end+1000,:])
    - compute the average power per window (power: square of the signal)
    - combine the three arrays containing the average power to one array
 %% Cell type:markdown id: tags:
 <center>
 <figure>
    <img src="./images/time_window.001.png" title="made at imgflip.com" width=75%/>
    <figcaption>Preprocessing steps for EEG-signals.</figcaption>
 </figure>
 </center>
 %% Cell type:markdown id: tags:
 #### Generate windows
 %% Cell type:code id: tags:
 ``` python
 %%time
 win_size = 500
 step_size = 2
 num_feat = 3
 num_win = int((all_data.shape[0] - (win_size * num_feat))/step_size)
 ix_start = np.arange(0, num_win*step_size - win_size*num_feat, step_size)
 ix_end = ix_start + 500
 ```
 %% Output
    CPU times: user 3.27 ms, sys: 3.47 ms, total: 6.73 ms
    Wall time: 5.11 ms
 %% Cell type:markdown id: tags:
 #### Compute the mean power per time window
 %% Cell type:code id: tags:
 ``` python
 def mean_pow(y):
    return np.mean(y**2, axis = 0)
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
 data_filt = []
 for start, end in zip(ix_start, ix_end):
    pow_1 = mean_pow(all_data[start:end, :])
    pow_2 = mean_pow(all_data[start+500:end+500, :])
    pow_3 = mean_pow(all_data[start+1000:end+1000, :])
    data_filt.append(np.hstack([pow_1, pow_2, pow_3]))
 data_filt = np.array(data_filt)
 events_filt = np.array([all_events[end + 1501, :] for end in ix_end])
 ```
 %% Output
    CPU times: user 1min 11s, sys: 2.11 s, total: 1min 13s
    Wall time: 1min 13s
 %% Cell type:markdown id: tags:
 ### Modeling
 %% Cell type:code id: tags:
 ``` python
 # split of the data
 from sklearn.model_selection import train_test_split
 X_train, X_test, y_train, y_test = train_test_split(data_filt, events_filt,\
                                         test_size = 0.33, shuffle = True)
 ```
 %% Cell type:markdown id: tags:
 #### Pipeline with single classifier
 %% Cell type:markdown id: tags:
 1. Define a pipeline which includes:
    - PCA to reduce the data to 10 dimensions
    - Scaling of the data
    - a classifier of your choice (e.g. LogisticRegression, AdaBoost...)
 2. Choose an appropriate parametrization of the classifier according to the imbalance of the data (see lecture 6).
 3. Transfer the multi-class classification problem into a one-vs-rest classification.
 4. Use cross-validation to test the model performance (cv = 5).
 5. Use the ROC-AUC curve and the confusion matrix for the evaluation of the model.
 6. Visualize the model performance by plotting the true and predicted hand movements.
 7. Repeat the above named steps for another classifier and compare the results.
 <div class="alert alert-block alert-warning">
    <i class="fa fa-info-circle"></i>&nbsp; <strong>ROC (Receiver Operating Characteristics) curve</strong>
    <p>A classifier can produce four different types of results:</p>
    <p>- <strong>true positive</strong> (arm movement was observed and predicted)</p>
    <p>- <strong>true negative</strong> (arm movement was not observed and not predicted)</p>
    <p>- <strong>false positive</strong> (arm movement was not observed but predicted)</p>
    <p>- <strong>false negative</strong> (arm movement was observed but not predicted)</p>
    <p>
        <figure>
        <img src="./images/evaluation-measures-for-roc.png" title="made at imgflip.com" width=50%/>
        </figure>
    </p>
    <p>
    These four possible outcomes also determine the sensitivity and specificity of the classifier:</p>
    <p>- <strong>sensitivity</strong>: true positive rate (should be high) </p>
    <p>- <strong>specificity</strong>: false positive rate (should be low) </p>
    <p>
        <figure>
        <img src="./images/a-roc-curve-connecting-points.png" title="made at imgflip.com" width=30%/>
        </figure>
    </p>
    <p>
    <p> As the sensitivity should be high and the specificity should be low the ROC-curve for different classifier performances looks as follows:
    </p>
    <p>
        <center>
        <figure>
        <table><tr>
        <td> <img src="./images/a-roc-curve-of-a-random-classifier.png" style="width: 400px;"/> </td>
        <td> <img src="./images/a-roc-curve-of-a-perfect-classifier.png" style="width: 400px;"/> </td>
        </tr></table>
        </figure>
        </center>
    </p>
    <p>
    The metric <strong>'roc-auc'</strong> describes the area under the ROC-curve. Thus, the higher this values is the better is the performance of the classifier.
    </p>
    <p> All figures are from: https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/
    </p>
 </div>
 <div class="alert alert-block alert-warning">
    <i class="fa fa-info-circle"></i>&nbsp; <strong>One-vs-rest classification</strong>
+    <p> Multiclass classification can also be tranferred to multiple binary classification problems. One strategy is called One-vs-rest, where one classifier is trained per class. In our case this means that for each arm movement one classifier is trained by considering only the labels of the respective arm movement.
+    </p>
 </div>
 %% Cell type:code id: tags:
 ``` python
 from sklearn.pipeline import make_pipeline
 from sklearn.decomposition import PCA
 from sklearn.preprocessing import StandardScaler
 from sklearn.linear_model import LogisticRegression
 from sklearn.ensemble import AdaBoostClassifier
 from sklearn.tree import DecisionTreeClassifier
 from sklearn.ensemble import RandomForestClassifier
 p_lr = make_pipeline(PCA(10),  StandardScaler(), LogisticRegression(class_weight = 'balanced', solver = 'lbfgs'))
 p_ab = make_pipeline(PCA(10),  StandardScaler(), AdaBoostClassifier(DecisionTreeClassifier(max_depth=10)))
 p_rf = make_pipeline(PCA(10),  StandardScaler(), RandomForestClassifier(class_weight = 'balanced', n_estimators = 10))
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
 from sklearn.model_selection import cross_val_score, cross_val_predict
 from sklearn.metrics import confusion_matrix, roc_auc_score
 preds_lr = []
 for i in range(6):
    y_pred = cross_val_predict(p_lr, X_train, y_train[:,i], cv=5)
    #p.fit(X_train, y_train[:,i])
    #y_pred = p.predict(X_test)
    preds_lr.append(y_pred)
    print(confusion_matrix(y_train[:,i], y_pred))
    print(roc_auc_score(y_train[:,i], y_pred))
 ```
 %% Output
    [[216681 245832]
     [  4079   8904]]
    0.5771531047754554
    [[176294 286200]
     [  4633   8369]]
    0.5124256829265147
    [[162438 300066]
     [  4412   8580]]
    0.5058103318547253
    [[154756 307734]
     [  3264   9742]]
    0.5418268538014647
    [[235926 226490]
     [  1481  11599]]
    0.698488317230169
    [[241616 220823]
     [  1317  11740]]
    0.7108082239663182
    CPU times: user 3min 47s, sys: 27.4 s, total: 4min 14s
    Wall time: 1min 23s
 %% Cell type:code id: tags:
 ``` python
 # %%time
 # from sklearn.model_selection import cross_val_score, cross_val_predict
 # from sklearn.metrics import confusion_matrix, roc_auc_score
 # preds_ab = []
 # for i in range(6):
 #     y_pred = cross_val_predict(p_ab, X_train, y_train[:,i], cv=5)
 #     #p.fit(X_train, y_train[:,i])
 #     #y_pred = p.predict(X_test)
 #     #preds.append(y_pred)
 #     print(confusion_matrix(y_train[:,i], y_pred))
 #     print(roc_auc_score(y_train[:,i], y_pred))
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
 from sklearn.model_selection import cross_val_score, cross_val_predict
 from sklearn.metrics import confusion_matrix, roc_auc_score
 preds_rf = []
 for i in range(6):
    y_pred = cross_val_predict(p_rf, X_train, y_train[:,i], cv=5)
    #p.fit(X_train, y_train[:,i])
    #y_pred = p.predict(X_test)
    preds_rf.append(y_pred)
    print(confusion_matrix(y_train[:,i], y_pred))
    print(roc_auc_score(y_train[:,i], y_pred))
 ```
 %% Output
    [[462437     76]
     [   811  12172]]
    0.9686846891035233
    [[462402     92]
     [   827  12175]]
    0.9680977396809418
    [[462425     79]
     [   794  12198]]
    0.9693573293233774
    [[462396     94]
     [   811  12195]]
    0.9687204582970532
    [[462295    121]
     [   561  12519]]
    0.9784242112983614
    [[462326    113]
     [   541  12516]]
    0.9791609648651236
    CPU times: user 7min 8s, sys: 23.1 s, total: 7min 31s
    Wall time: 5min 35s
 %% Cell type:markdown id: tags:
 #### Visualization of model results
 %% Cell type:code id: tags:
 ``` python
 plt.figure(figsize = (10,7))
 plt.subplots_adjust(wspace = 0.5)
 plt.subplots_adjust(hspace = 0.5)
 for i in range(6):
    plt.subplot(2,3,i+1)
    plt.plot(y_train[800:1000, i])
    plt.plot(preds_lr[i][800:1000], '--')
    #plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14)
    #plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14)
    plt.title('movement ' + str(i+1), fontsize = 14)
 ```
 %% Output
 %% Cell type:code id: tags:
 ``` python
 plt.figure(figsize = (10,7))
 plt.subplots_adjust(wspace = 0.5)
 plt.subplots_adjust(hspace = 0.5)
 for i in range(6):
    plt.subplot(2,3,i+1)
    plt.plot(y_train[800:1000, i])
    plt.plot(preds_rf[i][800:1000], '--')
    #plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14)
    #plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14)
    plt.title('movement ' + str(i+1), fontsize = 14)
 ```
 %% Output
 %% Cell type:code id: tags:
 ``` python
 ```