Skip to content
Snippets Groups Projects
Commit c87a7072 authored by Franziska Oschmann's avatar Franziska Oschmann
Browse files

added description of one-vs-rest classifier

parent 456bc996
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Chapter 9: Use case - prediction of arm movements # Chapter 9: Use case - prediction of arm movements
<center> <center>
<figure> <figure>
<table><tr> <table><tr>
<td> <img src="./images/eeg_cap.png" style="width: 400px;"/> </td> <td> <img src="./images/eeg_cap.png" style="width: 400px;"/> </td>
<td> <img src="./images/arm_movement.png" style="width: 400px;"/> </td> <td> <img src="./images/arm_movement.png" style="width: 400px;"/> </td>
</tr></table> </tr></table>
<figcaption>Setup of an EEG-experiment.</figcaption> <figcaption>Setup of an EEG-experiment.</figcaption>
</figure> </figure>
</center> </center>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<center> <center>
<figure> <figure>
<img src="./images/eeg_electrode_numbering.jpg" width=35%/> <img src="./images/eeg_electrode_numbering.jpg" width=35%/>
<figcaption>Arrangement of electrodes on head.</figcaption> <figcaption>Arrangement of electrodes on head.</figcaption>
</figure> </figure>
</center> </center>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This data contains EEG recordings of one subject performing **grasp-and-lift (GAL)** trials. This data contains EEG recordings of one subject performing **grasp-and-lift (GAL)** trials.
There is **1 subject** in total, **10 series** of trials for this subject, and approximately **30 trials** within each series. The number of trials varies for each series. There is **1 subject** in total, **10 series** of trials for this subject, and approximately **30 trials** within each series. The number of trials varies for each series.
For each **GAL**, you are tasked to detect 6 events: For each **GAL**, you are tasked to detect 6 events:
- HandStart - HandStart
- FirstDigitTouch - FirstDigitTouch
- BothStartLoadPhase - BothStartLoadPhase
- LiftOff - LiftOff
- Replace - Replace
- BothReleased - BothReleased
These events always occur in the same order. In this dataset, there are two files for the subject + series combination: These events always occur in the same order. In this dataset, there are two files for the subject + series combination:
the *_data.csv files contain the raw 32 channels EEG data (sampling rate 500Hz) the *_data.csv files contain the raw 32 channels EEG data (sampling rate 500Hz)
the *_events.csv files contains the ground truth frame-wise labels for all events the *_events.csv files contains the ground truth frame-wise labels for all events
Detailed information about the data can be found here: Detailed information about the data can be found here:
Luciw MD, Jarocka E, Edin BB (2014) Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction. Scientific Data 1:140047. www.nature.com/articles/sdata201447 Luciw MD, Jarocka E, Edin BB (2014) Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction. Scientific Data 1:140047. www.nature.com/articles/sdata201447
*Description from https://www.kaggle.com/c/grasp-and-lift-eeg-detection/data* *Description from https://www.kaggle.com/c/grasp-and-lift-eeg-detection/data*
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<center> <center>
<figure> <figure>
<img src="./images/eeg_signal_preprocessing.png" title="made at imgflip.com" width=75%/> <img src="./images/eeg_signal_preprocessing.png" title="made at imgflip.com" width=75%/>
<figcaption>Preprocessing steps for EEG-signals.</figcaption> <figcaption>Preprocessing steps for EEG-signals.</figcaption>
</figure> </figure>
</center> </center>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Load data ### Load data
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The data can be found in: '/data/eeg_use_case' and contains: The data can be found in: '/data/eeg_use_case' and contains:
- 8 series of recorded EEG data - 8 series of recorded EEG data
- 8 series of events of arm movements - 8 series of events of arm movements
Load the EEG data and the events: Load the EEG data and the events:
- combine all EEG series in one array (size: (total number of time series, number of channels)) - combine all EEG series in one array (size: (total number of time series, number of channels))
- combine all events in one array (size: (total number of time series, number of different arm movement)) - combine all events in one array (size: (total number of time series, number of different arm movement))
- pay attention to the order of the series - pay attention to the order of the series
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<div class="alert alert-block alert-warning"> <div class="alert alert-block alert-warning">
<i class="fa fa-info-circle"></i>&nbsp; <strong>Filter strings with the lambda-operator</strong> <i class="fa fa-info-circle"></i>&nbsp; <strong>Filter strings with the lambda-operator</strong>
The lambda-operator allows to build hidden functions, which are basically functions without a name. These hidden functions have any number of parameters, execute an expression and return the value of this expression. The lambda operator can be applied in the following way to filter the filenames: The lambda-operator allows to build hidden functions, which are basically functions without a name. These hidden functions have any number of parameters, execute an expression and return the value of this expression. The lambda operator can be applied in the following way to filter the filenames:
all_data_files = list(filter(lambda x: '_data' in x, os.listdir(path))) all_data_files = list(filter(lambda x: '_data' in x, os.listdir(path)))
</div> </div>
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def load_data(file_names, path): def load_data(file_names, path):
# read the csv file and drop the id column # read the csv file and drop the id column
dfs = [] dfs = []
for f in file_names: for f in file_names:
df = pd.read_csv(path + f).drop('id', axis = 1) df = pd.read_csv(path + f).drop('id', axis = 1)
dfs.append(df) dfs.append(df)
return dfs return dfs
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# define path and list of all data and event files # define path and list of all data and event files
import os import os
import pandas as pd import pandas as pd
path = 'data/eeg_use_case/' path = 'data/eeg_use_case/'
all_data_files = list(filter(lambda x: '_data' in x, os.listdir(path))) all_data_files = list(filter(lambda x: '_data' in x, os.listdir(path)))
all_event_files = list(filter(lambda x: '_events' in x, os.listdir(path))) all_event_files = list(filter(lambda x: '_events' in x, os.listdir(path)))
all_data_sort = np.sort(all_data_files) all_data_sort = np.sort(all_data_files)
all_event_sort = np.sort(all_event_files) all_event_sort = np.sort(all_event_files)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# load all data and event files # load all data and event files
all_data = np.concatenate(load_data(all_data_sort, path)) all_data = np.concatenate(load_data(all_data_sort, path))
all_events = np.concatenate(load_data(all_event_sort, path)) all_events = np.concatenate(load_data(all_event_sort, path))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Visualization ### Visualization
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Visualize the EEG-data and events and pay attention to: Visualize the EEG-data and events and pay attention to:
- the EEG traces - the EEG traces
- the number of detected arm movements - the number of detected arm movements
What do you observe? What do you observe?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cols = ['C0', 'C1', 'C2', 'C3'] cols = ['C0', 'C1', 'C2', 'C3']
ix = np.arange(len(columns))[::8] ix = np.arange(len(columns))[::8]
columns = pd.read_csv(path + all_data_sort[0]).columns[1:] columns = pd.read_csv(path + all_data_sort[0]).columns[1:]
labels = columns[::8] labels = columns[::8]
plt.figure(figsize = (7,10)) plt.figure(figsize = (7,10))
plt.subplots_adjust(hspace = 0.3) plt.subplots_adjust(hspace = 0.3)
for i, ch in enumerate(ix): for i, ch in enumerate(ix):
ax = plt.subplot(5,1,i+1) ax = plt.subplot(5,1,i+1)
ax.plot(all_data[(start-500):(start+3500), ch], linewidth = 1.5, color = cols[i], label = labels[i]) ax.plot(all_data[(start-500):(start+3500), ch], linewidth = 1.5, color = cols[i], label = labels[i])
ax.spines['right'].set_visible(False) ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) ax.spines['top'].set_visible(False)
ax.set_yticks([]) ax.set_yticks([])
ax.set_xticks([]) ax.set_xticks([])
ax.legend(loc='upper left', bbox_to_anchor= (0, 1.1), fontsize = 14) ax.legend(loc='upper left', bbox_to_anchor= (0, 1.1), fontsize = 14)
ax.set_ylim(-500,3000) ax.set_ylim(-500,3000)
ax = plt.subplot(5,1,5) ax = plt.subplot(5,1,5)
ax.spines['right'].set_visible(False) ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False) ax.spines['left'].set_visible(False)
ax.set_yticks([]) ax.set_yticks([])
ax.set_xticks([]) ax.set_xticks([])
ax.plot(all_events[(start-500):(start+3500)], linewidth = 2) ax.plot(all_events[(start-500):(start+3500)], linewidth = 2)
ax.set_xticks(np.arange(0,4100,1000)) ax.set_xticks(np.arange(0,4100,1000))
ax.set_xticklabels(['0', '2', '4', '6', '8'], fontsize = 14) ax.set_xticklabels(['0', '2', '4', '6', '8'], fontsize = 14)
ax.set_xlabel('Time [sec]', fontsize = 14) ax.set_xlabel('Time [sec]', fontsize = 14)
ax.set_ylim(0.1,1) ax.set_ylim(0.1,1)
lgd = ax.legend(['1', '2', '3', '4', '5', '6'], lgd = ax.legend(['1', '2', '3', '4', '5', '6'],
loc='lower left', bbox_to_anchor= (0.85, 0.1), ncol=2, loc='lower left', bbox_to_anchor= (0.85, 0.1), ncol=2,
borderaxespad=0, frameon=True, fontsize = 12) borderaxespad=0, frameon=True, fontsize = 12)
``` ```
%% Output %% Output
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(figsize = (10,7)) plt.figure(figsize = (10,7))
plt.subplots_adjust(wspace = 0.5) plt.subplots_adjust(wspace = 0.5)
plt.subplots_adjust(hspace = 0.5) plt.subplots_adjust(hspace = 0.5)
for i, e in enumerate(all_events.T): for i, e in enumerate(all_events.T):
plt.subplot(2,3,i+1) plt.subplot(2,3,i+1)
plt.hist(e, [0, 0.5, 1, 1.5]) plt.hist(e, [0, 0.5, 1, 1.5])
plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14) plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14)
plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14) plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14)
plt.title('movement ' + str(i+1), fontsize = 14) plt.title('movement ' + str(i+1), fontsize = 14)
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Feature extraction ### Feature extraction
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The purpose of the feature extraction is to extract time-dependent features from the EEG data. To do so, a sliding window containing 500 datapoints each is used. Three consecutive time windows each predict the event in the following time step. The purpose of the feature extraction is to extract time-dependent features from the EEG data. To do so, a sliding window containing 500 datapoints each is used. Three consecutive time windows each predict the event in the following time step.
Extract time-dependend features from the EEG-data: Extract time-dependend features from the EEG-data:
- define the start and end points of a sliding window with a length of 500 datapoints and a step size of 2 - define the start and end points of a sliding window with a length of 500 datapoints and a step size of 2
- loop through those start and end points - loop through those start and end points
- per iteration: - per iteration:
- take three consecutive time windows (window_1 = data[start:end,:], window_2 = data[start+500:end+500,:], - take three consecutive time windows (window_1 = data[start:end,:], window_2 = data[start+500:end+500,:],
window_3 = data[start+1000:end+1000,:]) window_3 = data[start+1000:end+1000,:])
- compute the average power per window (power: square of the signal) - compute the average power per window (power: square of the signal)
- combine the three arrays containing the average power to one array - combine the three arrays containing the average power to one array
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<center> <center>
<figure> <figure>
<img src="./images/time_window.001.png" title="made at imgflip.com" width=75%/> <img src="./images/time_window.001.png" title="made at imgflip.com" width=75%/>
<figcaption>Preprocessing steps for EEG-signals.</figcaption> <figcaption>Preprocessing steps for EEG-signals.</figcaption>
</figure> </figure>
</center> </center>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### Generate windows #### Generate windows
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%%time %%time
win_size = 500 win_size = 500
step_size = 2 step_size = 2
num_feat = 3 num_feat = 3
num_win = int((all_data.shape[0] - (win_size * num_feat))/step_size) num_win = int((all_data.shape[0] - (win_size * num_feat))/step_size)
ix_start = np.arange(0, num_win*step_size - win_size*num_feat, step_size) ix_start = np.arange(0, num_win*step_size - win_size*num_feat, step_size)
ix_end = ix_start + 500 ix_end = ix_start + 500
``` ```
%% Output %% Output
CPU times: user 3.27 ms, sys: 3.47 ms, total: 6.73 ms CPU times: user 3.27 ms, sys: 3.47 ms, total: 6.73 ms
Wall time: 5.11 ms Wall time: 5.11 ms
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### Compute the mean power per time window #### Compute the mean power per time window
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def mean_pow(y): def mean_pow(y):
return np.mean(y**2, axis = 0) return np.mean(y**2, axis = 0)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%%time %%time
data_filt = [] data_filt = []
for start, end in zip(ix_start, ix_end): for start, end in zip(ix_start, ix_end):
pow_1 = mean_pow(all_data[start:end, :]) pow_1 = mean_pow(all_data[start:end, :])
pow_2 = mean_pow(all_data[start+500:end+500, :]) pow_2 = mean_pow(all_data[start+500:end+500, :])
pow_3 = mean_pow(all_data[start+1000:end+1000, :]) pow_3 = mean_pow(all_data[start+1000:end+1000, :])
data_filt.append(np.hstack([pow_1, pow_2, pow_3])) data_filt.append(np.hstack([pow_1, pow_2, pow_3]))
data_filt = np.array(data_filt) data_filt = np.array(data_filt)
events_filt = np.array([all_events[end + 1501, :] for end in ix_end]) events_filt = np.array([all_events[end + 1501, :] for end in ix_end])
``` ```
%% Output %% Output
CPU times: user 1min 11s, sys: 2.11 s, total: 1min 13s CPU times: user 1min 11s, sys: 2.11 s, total: 1min 13s
Wall time: 1min 13s Wall time: 1min 13s
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Modeling ### Modeling
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# split of the data # split of the data
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data_filt, events_filt,\ X_train, X_test, y_train, y_test = train_test_split(data_filt, events_filt,\
test_size = 0.33, shuffle = True) test_size = 0.33, shuffle = True)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### Pipeline with single classifier #### Pipeline with single classifier
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
1. Define a pipeline which includes: 1. Define a pipeline which includes:
- PCA to reduce the data to 10 dimensions - PCA to reduce the data to 10 dimensions
- Scaling of the data - Scaling of the data
- a classifier of your choice (e.g. LogisticRegression, AdaBoost...) - a classifier of your choice (e.g. LogisticRegression, AdaBoost...)
2. Choose an appropriate parametrization of the classifier according to the imbalance of the data (see lecture 6). 2. Choose an appropriate parametrization of the classifier according to the imbalance of the data (see lecture 6).
3. Transfer the multi-class classification problem into a one-vs-rest classification. 3. Transfer the multi-class classification problem into a one-vs-rest classification.
4. Use cross-validation to test the model performance (cv = 5). 4. Use cross-validation to test the model performance (cv = 5).
5. Use the ROC-AUC curve and the confusion matrix for the evaluation of the model. 5. Use the ROC-AUC curve and the confusion matrix for the evaluation of the model.
6. Visualize the model performance by plotting the true and predicted hand movements. 6. Visualize the model performance by plotting the true and predicted hand movements.
7. Repeat the above named steps for another classifier and compare the results. 7. Repeat the above named steps for another classifier and compare the results.
<div class="alert alert-block alert-warning"> <div class="alert alert-block alert-warning">
<i class="fa fa-info-circle"></i>&nbsp; <strong>ROC (Receiver Operating Characteristics) curve</strong> <i class="fa fa-info-circle"></i>&nbsp; <strong>ROC (Receiver Operating Characteristics) curve</strong>
<p>A classifier can produce four different types of results:</p> <p>A classifier can produce four different types of results:</p>
<p>- <strong>true positive</strong> (arm movement was observed and predicted)</p> <p>- <strong>true positive</strong> (arm movement was observed and predicted)</p>
<p>- <strong>true negative</strong> (arm movement was not observed and not predicted)</p> <p>- <strong>true negative</strong> (arm movement was not observed and not predicted)</p>
<p>- <strong>false positive</strong> (arm movement was not observed but predicted)</p> <p>- <strong>false positive</strong> (arm movement was not observed but predicted)</p>
<p>- <strong>false negative</strong> (arm movement was observed but not predicted)</p> <p>- <strong>false negative</strong> (arm movement was observed but not predicted)</p>
<p> <p>
<figure> <figure>
<img src="./images/evaluation-measures-for-roc.png" title="made at imgflip.com" width=50%/> <img src="./images/evaluation-measures-for-roc.png" title="made at imgflip.com" width=50%/>
</figure> </figure>
</p> </p>
<p> <p>
These four possible outcomes also determine the sensitivity and specificity of the classifier:</p> These four possible outcomes also determine the sensitivity and specificity of the classifier:</p>
<p>- <strong>sensitivity</strong>: true positive rate (should be high) </p> <p>- <strong>sensitivity</strong>: true positive rate (should be high) </p>
<p>- <strong>specificity</strong>: false positive rate (should be low) </p> <p>- <strong>specificity</strong>: false positive rate (should be low) </p>
<p> <p>
<figure> <figure>
<img src="./images/a-roc-curve-connecting-points.png" title="made at imgflip.com" width=30%/> <img src="./images/a-roc-curve-connecting-points.png" title="made at imgflip.com" width=30%/>
</figure> </figure>
</p> </p>
<p> <p>
<p> As the sensitivity should be high and the specificity should be low the ROC-curve for different classifier performances looks as follows: <p> As the sensitivity should be high and the specificity should be low the ROC-curve for different classifier performances looks as follows:
</p> </p>
<p> <p>
<center> <center>
<figure> <figure>
<table><tr> <table><tr>
<td> <img src="./images/a-roc-curve-of-a-random-classifier.png" style="width: 400px;"/> </td> <td> <img src="./images/a-roc-curve-of-a-random-classifier.png" style="width: 400px;"/> </td>
<td> <img src="./images/a-roc-curve-of-a-perfect-classifier.png" style="width: 400px;"/> </td> <td> <img src="./images/a-roc-curve-of-a-perfect-classifier.png" style="width: 400px;"/> </td>
</tr></table> </tr></table>
</figure> </figure>
</center> </center>
</p> </p>
<p> <p>
The metric <strong>'roc-auc'</strong> describes the area under the ROC-curve. Thus, the higher this values is the better is the performance of the classifier. The metric <strong>'roc-auc'</strong> describes the area under the ROC-curve. Thus, the higher this values is the better is the performance of the classifier.
</p> </p>
<p> All figures are from: https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/ <p> All figures are from: https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/
</p> </p>
</div> </div>
<div class="alert alert-block alert-warning"> <div class="alert alert-block alert-warning">
<i class="fa fa-info-circle"></i>&nbsp; <strong>One-vs-rest classification</strong> <i class="fa fa-info-circle"></i>&nbsp; <strong>One-vs-rest classification</strong>
<p> Multiclass classification can also be tranferred to multiple binary classification problems. One strategy is called One-vs-rest, where one classifier is trained per class. In our case this means that for each arm movement one classifier is trained by considering only the labels of the respective arm movement.
</p>
</div> </div>
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from sklearn.pipeline import make_pipeline from sklearn.pipeline import make_pipeline
from sklearn.decomposition import PCA from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import AdaBoostClassifier from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestClassifier
p_lr = make_pipeline(PCA(10), StandardScaler(), LogisticRegression(class_weight = 'balanced', solver = 'lbfgs')) p_lr = make_pipeline(PCA(10), StandardScaler(), LogisticRegression(class_weight = 'balanced', solver = 'lbfgs'))
p_ab = make_pipeline(PCA(10), StandardScaler(), AdaBoostClassifier(DecisionTreeClassifier(max_depth=10))) p_ab = make_pipeline(PCA(10), StandardScaler(), AdaBoostClassifier(DecisionTreeClassifier(max_depth=10)))
p_rf = make_pipeline(PCA(10), StandardScaler(), RandomForestClassifier(class_weight = 'balanced', n_estimators = 10)) p_rf = make_pipeline(PCA(10), StandardScaler(), RandomForestClassifier(class_weight = 'balanced', n_estimators = 10))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%%time %%time
from sklearn.model_selection import cross_val_score, cross_val_predict from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.metrics import confusion_matrix, roc_auc_score from sklearn.metrics import confusion_matrix, roc_auc_score
preds_lr = [] preds_lr = []
for i in range(6): for i in range(6):
y_pred = cross_val_predict(p_lr, X_train, y_train[:,i], cv=5) y_pred = cross_val_predict(p_lr, X_train, y_train[:,i], cv=5)
#p.fit(X_train, y_train[:,i]) #p.fit(X_train, y_train[:,i])
#y_pred = p.predict(X_test) #y_pred = p.predict(X_test)
preds_lr.append(y_pred) preds_lr.append(y_pred)
print(confusion_matrix(y_train[:,i], y_pred)) print(confusion_matrix(y_train[:,i], y_pred))
print(roc_auc_score(y_train[:,i], y_pred)) print(roc_auc_score(y_train[:,i], y_pred))
``` ```
%% Output %% Output
[[216681 245832] [[216681 245832]
[ 4079 8904]] [ 4079 8904]]
0.5771531047754554 0.5771531047754554
[[176294 286200] [[176294 286200]
[ 4633 8369]] [ 4633 8369]]
0.5124256829265147 0.5124256829265147
[[162438 300066] [[162438 300066]
[ 4412 8580]] [ 4412 8580]]
0.5058103318547253 0.5058103318547253
[[154756 307734] [[154756 307734]
[ 3264 9742]] [ 3264 9742]]
0.5418268538014647 0.5418268538014647
[[235926 226490] [[235926 226490]
[ 1481 11599]] [ 1481 11599]]
0.698488317230169 0.698488317230169
[[241616 220823] [[241616 220823]
[ 1317 11740]] [ 1317 11740]]
0.7108082239663182 0.7108082239663182
CPU times: user 3min 47s, sys: 27.4 s, total: 4min 14s CPU times: user 3min 47s, sys: 27.4 s, total: 4min 14s
Wall time: 1min 23s Wall time: 1min 23s
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# %%time # %%time
# from sklearn.model_selection import cross_val_score, cross_val_predict # from sklearn.model_selection import cross_val_score, cross_val_predict
# from sklearn.metrics import confusion_matrix, roc_auc_score # from sklearn.metrics import confusion_matrix, roc_auc_score
# preds_ab = [] # preds_ab = []
# for i in range(6): # for i in range(6):
# y_pred = cross_val_predict(p_ab, X_train, y_train[:,i], cv=5) # y_pred = cross_val_predict(p_ab, X_train, y_train[:,i], cv=5)
# #p.fit(X_train, y_train[:,i]) # #p.fit(X_train, y_train[:,i])
# #y_pred = p.predict(X_test) # #y_pred = p.predict(X_test)
# #preds.append(y_pred) # #preds.append(y_pred)
# print(confusion_matrix(y_train[:,i], y_pred)) # print(confusion_matrix(y_train[:,i], y_pred))
# print(roc_auc_score(y_train[:,i], y_pred)) # print(roc_auc_score(y_train[:,i], y_pred))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%%time %%time
from sklearn.model_selection import cross_val_score, cross_val_predict from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.metrics import confusion_matrix, roc_auc_score from sklearn.metrics import confusion_matrix, roc_auc_score
preds_rf = [] preds_rf = []
for i in range(6): for i in range(6):
y_pred = cross_val_predict(p_rf, X_train, y_train[:,i], cv=5) y_pred = cross_val_predict(p_rf, X_train, y_train[:,i], cv=5)
#p.fit(X_train, y_train[:,i]) #p.fit(X_train, y_train[:,i])
#y_pred = p.predict(X_test) #y_pred = p.predict(X_test)
preds_rf.append(y_pred) preds_rf.append(y_pred)
print(confusion_matrix(y_train[:,i], y_pred)) print(confusion_matrix(y_train[:,i], y_pred))
print(roc_auc_score(y_train[:,i], y_pred)) print(roc_auc_score(y_train[:,i], y_pred))
``` ```
%% Output %% Output
[[462437 76] [[462437 76]
[ 811 12172]] [ 811 12172]]
0.9686846891035233 0.9686846891035233
[[462402 92] [[462402 92]
[ 827 12175]] [ 827 12175]]
0.9680977396809418 0.9680977396809418
[[462425 79] [[462425 79]
[ 794 12198]] [ 794 12198]]
0.9693573293233774 0.9693573293233774
[[462396 94] [[462396 94]
[ 811 12195]] [ 811 12195]]
0.9687204582970532 0.9687204582970532
[[462295 121] [[462295 121]
[ 561 12519]] [ 561 12519]]
0.9784242112983614 0.9784242112983614
[[462326 113] [[462326 113]
[ 541 12516]] [ 541 12516]]
0.9791609648651236 0.9791609648651236
CPU times: user 7min 8s, sys: 23.1 s, total: 7min 31s CPU times: user 7min 8s, sys: 23.1 s, total: 7min 31s
Wall time: 5min 35s Wall time: 5min 35s
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### Visualization of model results #### Visualization of model results
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(figsize = (10,7)) plt.figure(figsize = (10,7))
plt.subplots_adjust(wspace = 0.5) plt.subplots_adjust(wspace = 0.5)
plt.subplots_adjust(hspace = 0.5) plt.subplots_adjust(hspace = 0.5)
for i in range(6): for i in range(6):
plt.subplot(2,3,i+1) plt.subplot(2,3,i+1)
plt.plot(y_train[800:1000, i]) plt.plot(y_train[800:1000, i])
plt.plot(preds_lr[i][800:1000], '--') plt.plot(preds_lr[i][800:1000], '--')
#plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14) #plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14)
#plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14) #plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14)
plt.title('movement ' + str(i+1), fontsize = 14) plt.title('movement ' + str(i+1), fontsize = 14)
``` ```
%% Output %% Output
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(figsize = (10,7)) plt.figure(figsize = (10,7))
plt.subplots_adjust(wspace = 0.5) plt.subplots_adjust(wspace = 0.5)
plt.subplots_adjust(hspace = 0.5) plt.subplots_adjust(hspace = 0.5)
for i in range(6): for i in range(6):
plt.subplot(2,3,i+1) plt.subplot(2,3,i+1)
plt.plot(y_train[800:1000, i]) plt.plot(y_train[800:1000, i])
plt.plot(preds_rf[i][800:1000], '--') plt.plot(preds_rf[i][800:1000], '--')
#plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14) #plt.xticks([0.25, 1.25], ['no event', 'event'], fontsize = 14)
#plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14) #plt.yticks([500000, 1000000], [r'$5 \cdot 10^{5}$', r'$1 \cdot 10^{6}$'], fontsize = 14)
plt.title('movement ' + str(i+1), fontsize = 14) plt.title('movement ' + str(i+1), fontsize = 14)
``` ```
%% Output %% Output
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment