09_eeg_use_case.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import os\n",
    "# import glob\n",
    "import pandas as pd\n",
    "# from scipy.signal import resample, butter, lfilter\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from tqdm import tqdm\n",
    "# from itertools import islice\n",
    "\n",
    "from sklearn.decomposition import PCA\n",
    "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.ensemble import RandomForestClassifier, VotingClassifier, AdaBoostClassifier\n",
    "from sklearn.metrics import precision_recall_fscore_support, roc_auc_score, confusion_matrix\n",
    "from sklearn.model_selection import train_test_split\n",
    "# from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "# from multiprocessing import Pool\n",
    "# from multiprocessing.pool import ThreadPool\n",
    "\n",
    "# import time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 9: Use case - prediction of arm movements\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "    <img src=\"./images/eeg_cap.png\" title=\"made at imgflip.com\" width=35%/> \n",
    "    <img src=\"./images/arm_movement.png\" title=\"made at imgflip.com\" width=35%/>\n",
    "    <figcaption>Setup of an EEG-experiment.</figcaption>\n",
    "</figure>\n",
    "</center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This data contains EEG recordings of subjects performing **grasp-and-lift (GAL)** trials. \n",
    "\n",
    "There are **12 subjects** in total, **10 series** of trials for each subject, and approximately **30 trials** within each series. The number of trials varies for each series. The training set contains the first 8 series for each subject. The test set contains the 9th and 10th series.\n",
    "\n",
    "For each **GAL**, you are tasked to detect 6 events:\n",
    "\n",
    "- HandStart\n",
    "- FirstDigitTouch\n",
    "- BothStartLoadPhase\n",
    "- LiftOff\n",
    "- Replace\n",
    "- BothReleased\n",
    "\n",
    "These events always occur in the same order. In the training set, there are two files for each subject + series combination:\n",
    "\n",
    "the *_data.csv files contain the raw 32 channels EEG data (sampling rate 500Hz)\n",
    "the *_events.csv files contains the ground truth frame-wise labels for all events\n",
    "\n",
    "\n",
    "Detailed information about the data can be found here:\n",
    "Luciw MD, Jarocka E, Edin BB (2014) Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction. Scientific Data 1:140047. www.nature.com/articles/sdata201447\n",
    "\n",
    "*Description from https://www.kaggle.com/c/grasp-and-lift-eeg-detection/data*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<figure>\n",
    "    <img src=\"./images/eeg_signal_preprocessing.png\" title=\"made at imgflip.com\" width=75%/> \n",
    "    <figcaption>Preprocessing steps for EEG-signals.</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Task 1: Load the trainig and test data sets and ... the order of the sessions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def filter_data(data, events, subj = None):\n",
    "    # filter data for specific subjects\n",
    "    if subj:\n",
    "        data_filt = list(filter(lambda x: subj + '_' in x, data))\n",
    "    else:\n",
    "        data_filt = data\n",
    "\n",
    "    events_filt = []\n",
    "    for d in data_filt:\n",
    "        subj, series, end = d.split('_')\n",
    "        ix = np.where([subj + '_' in a and series in a for a in events])[0][0]\n",
    "        events_filt.append(events[ix])\n",
    "\n",
    "    return data_filt, events_filt\n",
    "\n",
    "def load_data(file_names, path):\n",
    "    # read the csv file and drop the id column\n",
    "    dfs = []\n",
    "    for f in file_names:\n",
    "        df = pd.read_csv(path + f)\n",
    "        df = df.drop('id', axis = 1)\n",
    "        dfs.append(df)\n",
    "    #all_dfs = pd.concat(dfs)\n",
    "    all_dfs = dfs\n",
    "    return all_dfs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define path and list all data and event files\n",
    "path = '../ml-use-case-eeg/train/' \n",
    "\n",
    "all_data_files = list(filter(lambda x: '_data' in x, os.listdir(path)))\n",
    "all_event_files = list(filter(lambda x: '_events' in x, os.listdir(path)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sort data and event file names\n",
    "data_filt, events_filt = filter_data(all_data_files, all_event_files, subj='subj1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load all data and event files\n",
    "all_data = np.concatenate(load_data(data_filt, path))\n",
    "all_events = np.concatenate(load_data(events_filt, path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature extraction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Task .. : Extract time-dependend features.\n",
    "\n",
    "Single steps:\n",
    "- define sliding window of length 500 (datapoints)\n",
    "- compute the average power per window (power: square of the signal)\n",
    "- three consecutive windows predict the event in the following time step\n",
    "- the window slides with a step size of 2 throught the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<figure>\n",
    "    <img src=\"./images/time_window.001.png\" title=\"made at imgflip.com\" width=75%/> \n",
    "    <figcaption>Preprocessing steps for EEG-signals.</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate windows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 857 ms, sys: 40.8 ms, total: 898 ms\n",
      "Wall time: 899 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "win_size = 500\n",
    "step_size = 2\n",
    "num_feat = 3\n",
    "num_win = int((all_data.shape[0] - (win_size * num_feat))/step_size)\n",
    "ix_start = np.arange(0, num_win*step_size - win_size*num_feat, step_size)\n",
    "ix_end = ix_start + 500\n",
    "\n",
    "all_events_resh = np.array([all_events[end + 1501, :] for end in ix_end])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Compute the mean power per time window"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# def butter_bandpass(fs, lowcut, highcut, order = 5):\n",
    "#     nyq = 0.5 * fs\n",
    "#     low = lowcut / nyq\n",
    "#     high = highcut / nyq\n",
    "#     b, a = butter(order, [low, high], btype='band')\n",
    "#     return b, a\n",
    "\n",
    "# def butter_bandpass_filter(data):\n",
    "#     b, a = butter_bandpass(fs = 500, lowcut = 0, highcut = 50)\n",
    "#     y = lfilter(b, a, data, axis = 0)\n",
    "\n",
    "#     filt_mean_pow = mean_pow(y)\n",
    "#     return filt_mean_pow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mean_pow(y):\n",
    "    return np.mean(y**2, axis = 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████▉| 709629/709696 [02:44<00:00, 4331.54it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2min 44s, sys: 1.51 s, total: 2min 45s\n",
      "Wall time: 2min 45s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "pbar = tqdm(total = len(ix_start))\n",
    "filt_data = []\n",
    "for start, end in zip(ix_start, ix_end):\n",
    "    pow_1 = mean_pow(all_data[start:end, :])\n",
    "    pow_2 = mean_pow(all_data[start+500:end+500, :])\n",
    "    pow_3 = mean_pow(all_data[start+1000:end+1000, :])\n",
    "    filt_data.append(np.hstack([pow_1, pow_2, pow_3]))\n",
    "    \n",
    "    pbar.update(1)\n",
    "    \n",
    "filt_data = np.array(filt_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Dimensionality reduction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "pca = PCA(n_components=10)\n",
    "filt_data_red = pca.fit_transform(filt_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.collections.PathCollection at 0x2ba7365b2320>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEQCAYAAACQip4+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAE8lJREFUeJzt3XuQXGWdxvHnmYQBQoJkySwiIQxGhYmRADsViFwUEAvQJX8sYsArG82iLOUdsbJb62pZq1SBtwLckTXorlwCq1YWCIsKMRADOJEEcgFNICwBNMM9KDom+e0f3TNMJt3TpzN9+vSbfD9VXTNz+p3Tz0xmnrxz+u1zHBECAKSjregAAID6UNwAkBiKGwASQ3EDQGIobgBIDMUNAInJrbhtf8/2ZturM4z9uu2V5dtvbL+QVy4ASJ3zWsdt+2RJL0v6QURMr+PzLpZ0TET8fS7BACBxuc24I2KppOeGbrM91fbttlfYvtv2kRU+9TxJ1+eVCwBSN7bJj9cj6cKI+K3t4yRdJenUgTttHybpcEl3NjkXACSjacVte7ykt0q6yfbA5r2HDZsj6eaI2NasXACQmmbOuNskvRARR48wZo6ki5qUBwCS1LTlgBHxkqTHbL9HklwyY+D+8vHuiZKWNysTAKQoz+WA16tUwkfY3mR7rqT3SZpre5WkNZJmD/mUOZJuCE5XCAAjym05IAAgH7xyEgASk8uTk5MmTYrOzs48dg0Au6UVK1Y8ExEdWcbmUtydnZ3q7e3NY9cAsFuy/XjWsRwqAYDEUNwAkBiKGwASQ3EDQGIobgBIDMUNAImhuAEgMRQ3ACSm2RdSGJWFX56vJ1avGvz40OkzdO4/f6XARADQfDVn3LaPGHIh35W2X7L9yWaEG2p4aUvSE6tXaeGX5zc7CgAUquaMOyIekXS0JNkeI+lJST/OOddOhpd2re0AsLuq9xj3aZI2RETm19QDABqr3uKeoypXYLc9z3av7d6+vr7RJwMAVJS5uG23Szpb0k2V7o+Inojojojujo5MZyYEAOyCembcZ0r6dUT8Pq8wAIDa6inu81TlMAkAoHkyFbft/SSdLulH+cYBANSS6QU4EfEHSQfmnAUAkAEveQeAxFDcAJAYihsAEpNEcf/iuoeLjgAALSOJ4l699KmiIwBAy0iiuAEAr0qouPerczsA7J6SKe59Jv6Ddi7p/fSG4z9fRBwAKExSV8AplfeOZn/q2AKSAEBxkplxAwBKKG4ASEwSxT1u/73q2g4Au7MkivuCy07aqaTH7b+XLrjspIISAUBxknlykpIGgJIkZtwAgFdR3ACQGIobABKT9dJlB9i+2fbDttfZnpV3MABAZVmfnPympNsj4hzb7ZLG5ZgJADCCmsVt+zWSTpb0YUmKiH5J/fnGAgBUk+VQyeGS+iQtsP2A7WvKV30HABQgS3GPlXSspKsj4hhJf5B06fBBtufZ7rXd29fX1+CYAIABWYp7k6RNEXFf+eObVSryHURET0R0R0R3R0dHIzMCAIaoWdwR8TtJT9g+orzpNElrc00FAKgq66qSiyX9sLyi5FFJF+QXCQAwkkzFHRErJXXnnAUAkAGvnASAxFDcAJAYihsAEkNxA0BiKG4ASAzFDQCJobgBIDEUNwAkhuIGgMRQ3ACQGIobABJDcQNAYihuAEgMxQ0AiaG4ASAxFDcAJIbiBoDEUNwAkJhMly6zvVHSFknbJG2NCC5jBgAFyXqxYEk6JSKeyS0JACATDpUAQGKyFndIusP2CtvzKg2wPc92r+3evr6+xiUEAOwga3GfGBHHSjpT0kW2Tx4+ICJ6IqI7Iro7OjoaGhIA8KpMxR0RT5bfbpb0Y0kz8wwFAKiuZnHb3s/2hIH3Jb1T0uq8gwEAKsuyquQgST+2PTD+uoi4PddUAICqahZ3RDwqaUYTsgAAMmA5IAAkhuIGgMRQ3ACQGIobABJDcQNAYihuAEgMxQ0AiaG4ASAxFDcAJIbiBoDEUNwAkBiKGwASQ3EDQGIobgBIDMUNAImhuAEgMRQ3ACSG4gaAxGQubttjbD9g+5Y8AwEARlbPjPsTktblFQQAkE2m4rY9WdK7JF2TbxwAQC1ZZ9zfkHSJpO3VBtieZ7vXdm9fX19DwgEAdlazuG2/W9LmiFgx0riI6ImI7ojo7ujoaFhAAMCOssy4T5B0tu2Nkm6QdKrt/8o1FQCgqprFHRFfiIjJEdEpaY6kOyPi/bknAwBUxDpuAEjM2HoGR8QSSUtySQIAyIQZNwAkhuIGgMRQ3ACQGIobABJDcQNAYihuAEgMxQ0AiaG4ASAxFDcAJIbiBoDEUNwAkBiKGwASQ3EDQGIobgBIDMUNAImhuAEgMRQ3ACQmy1Xe97F9v+1VttfY/tdmBAMAVJbl0mV/lnRqRLxsey9J99heHBH35pwNAFBBzeKOiJD0cvnDvcq3yDMUAKC6TMe4bY+xvVLSZkk/jYj7KoyZZ7vXdm9fX1+jcwIAyjIVd0Rsi4ijJU2WNNP29ApjeiKiOyK6Ozo6Gp0TAFBW16qSiHhB0l2SzsgnDgCgliyrSjpsH1B+f19Jp0t6OO9gAIDKsqwqOVjS922PUanoF0bELfnGAgBUk2VVyYOSjmlCFgBABrxyEgASQ3EDQGIobgBIDMUNAImhuAEgMRQ3ACSG4gaAxFDcAJAYihsAEkNxA0BiKG4ASAzFDQCJobgBIDEUNwAkhuIGgMRQ3ACQGIobABJDcQNAYrJcLPhQ23fZXmt7je1PNCMYAKCyLBcL3irpMxHxa9sTJK2w/dOIWJtzNgBABTVn3BHxdET8uvz+FknrJB2SdzAAQGV1HeO23anSFd/vq3DfPNu9tnv7+voakw4AsJPMxW17vKT/lvTJiHhp+P0R0RMR3RHR3dHR0ciMAIAhMhW37b1UKu0fRsSP8o0EABhJllUllvQfktZFxBX5RwIAjCTLjPsESR+QdKrtleXbWTnnAgBUUXM5YETcI8lNyAIAyIBXTgJAYihuAEgMxQ0AiaG4ASAxFDcAJIbiBoDEUNwAkBiKGwASQ3EDQGIobgBIDMUNAImhuAEgMVmuOVmIzktv3Wnbxq++q4AkANBaWnLGXam0R9oOAHuSlixuAEB1FDcAJIbiBoDEZLnm5Pdsb7a9uhmBAAAjyzLjvlbSGTnnAABkVLO4I2KppOeakAUAkEHLruNe336+xgy5RPG2kN7Qf11xgQCgRTSsuG3PkzRPkqZMmTKqfa1vP1/nTD5IG9rbB7dN7e/X+k3nS3pxVPsGgNQ1rLgjokdSjyR1d3fHaPZ1zuSD9MWvj1F7bBvc1u8xOudTB+kno4sJAMlryeWApdKWrFdv7VHaDgB7uizLAa+XtFzSEbY32Z6bd6iB0t4hR3k7AOzpah4qiYjzmhEEAJBNSx4qAQBUR3EDQGIobgBIDMUNAImhuAEgMRQ3ACSmJYt73KzjNXzJdpS3A8CeriWLu3PBgp1Ketys49W5YEFBiQCgdbTs2QEpaQCorCVn3ACA6ihuAEgMxQ0AiaG4ASAxFDcAJIbiBoDEtOxyQLSudXffpduv/oa2b9tWe/Awhx31dp0z/7M5pAL2HC1b3GuO7NrhKjgh6c0Pr6trH+/77nIt2/Dc4McnTP0r/fCjsxoTcA+17u67dNuVV0ixa5cjevzBJbr8vUuqD2g7VPu85j019zPxtfvq/C/yb4k9k2MXfwFH0t3dHb29vbv8+Ze/9907bTtz1QZJ2Qt8aGnfeNtnNaG/8jhL6qrzP4Q9Wc9FF2jLM335PgjljT2Q7RUR0Z1lbKZj3LbPsP2I7fW2Lx1dvJFVKm1JWjxjqhbPmCqrNBuvZXhpD73w8NBbSFqXYX8o2fLsM/k/yPYnMg17/nev5BwEaE01D5XYHiPpSkmnS9ok6Ve2F0XE2rzDVbJ4xtRSrnLZhqQ5n2/Tisc26ZuPnDT4P9HF5bf3dE3VxLHnataKf6u4v4HyXntkl7a0S8c9WHv2feWFd+pPz1+x0/bP3HhLfV9MgiYcOCn/GTeQo40XXKBXlt87+HG1Yw4vt0szM/RBEbLMuGdKWh8Rj0ZEv6QbJM3ON1ZtA7NvS7rha9v1zUdOqPrFPL91oZb/zReq7mtgPxP6pfuO6lLnpbcO3oarVtpS9b8WdicnzfmgZNceCLSg4aUtVf9rfHx/aUI3/LbmyC5d+Llpesu10/XRq6c2/WuQsj05eYikoX+7bpJ03PBBtudJmidJU6ZMaUi4LDz4duQyeWX8IZn2NaFfOrvtHi3afqIkqfPSW7Xxq+8aZcrdR9dJp0jSLq8qyaTt0EzDJr5233weH7ut4aU9kmqNYkkX/09I3q5l0/bV3Kun6uePf2uncXn2RsNWlUREj6QeqfTkZKP2W4RLxi7Uov4Ti47RsrpOOmWwwGv5/iVf0TOPL8++8ypPTG798zpt/eMdkl79z+Lp56XL31t5N5MOm6UPXTY/++MCdWiTdP6S0LI3W/fvW3kCkeekL0txPylp6BRocnlbcgb+N6n1h/7r/GzeUfYYoynPX1z3sFYvfapc2ovr+txnHl/etENXM04/S+/4yMeb8lhoHQe+VNxjZznG/StJb7R9uO12SXMkLco3VnYxeBt5kj/x4HGa9vC6wScjq+1rS7v0VBzY2JDYJWvueUqStPVP9xScZGSrfnqbfnbNVUXHQAb7NvAqWs/u37Bd1a1mcUfEVkn/KOl/Ja2TtDAi1uQVKOvKjDNXbRgs7Tmfb9Mnjlim7VXGHtw1f3C9b9eQ8h5+29IufeRTbbps67lVH/ei75ya9UvBKMXAP+j2LYXmyOLBn99edARk0LlgwU7lPbwHstgu6bq3W4rQzFeavyw10zHuiLhN0m05ZxlU77K6h8pvP5dx/MALbt6y4M07rpCI0Cm/PXvwiUmp8hMMn7nxlop/hu8JywGbyW3l8m6b0PLlHdurTRvQaka6utb9R3VpfJUX6w0ISd/+W2vZtDbNfOWVik9M5q0lXzkJSKM7xt1sbmvTp69vmSOIaJJKS4YH1PvEZMNfOQkU4W3nH6npJ79OY/fu0thxZ0oaU3Skqo467YyiI6AA1co57yXEzLixW7jz2p/ogcULNHS5YLOwqgSNUM+Mm+IGgBbAoRIA2I1R3ACQGIobABJDcQNAYihuAEgMxQ0AiaG4ASAxFDcAJCaXF+DY7pP0eIN2N0lSE65Q23Dkbi5yNxe5G++wiOjIMjCX4m4k271ZX03USsjdXORuLnIXi0MlAJAYihsAEpNCcfcUHWAXkbu5yN1c5C5Qyx/jBgDsKIUZNwBgCIobABLTMsVt+wzbj9heb/vSCvfvbfvG8v332e5sfsqdZcj9adtrbT9o++e2Dysi53C1cg8Z93e2w3bhS6iyZLZ9bvn7vcb2dc3OWEmGn5Eptu+y/UD55+SsInIOZ/t7tjfbXl3lftv+VvnretD2sc3OWEmG3O8r533I9i9tz2h2xlGLiMJvKl1McIOk10tql7RK0rRhYz4u6Tvl9+dIujGR3KdIGld+/2Op5C6PmyBpqaR7JXW3emZJb5T0gKSJ5Y//OoXvtUpPmH2s/P40SRuLzl3OcrKkYyWtrnL/WZIWS7Kk4yXdV3TmjLnfOuRn5MxWyV3PrVVm3DMlrY+IRyOiX9INkmYPGzNb0vfL798s6TTbbmLGSmrmjoi7IuKP5Q/vlTS5yRkryfL9lqQvS/qapD81M1wVWTJ/VNKVEfG8JEXE5iZnrCRL7pC0f/n910h6qon5qoqIpZKeG2HIbEk/iJJ7JR1g++DmpKuuVu6I+OXAz4ha53eyLq1S3IdIemLIx5vK2yqOiYitkl6UdGBT0lWXJfdQc1WaoRStZu7yn72HRsStzQw2gizf6zdJepPtZbbvtd0Kl17PkvuLkt5ve5Ok2yRd3Jxoo1bvz38rapXfybqMLTrAnsL2+yV1S3pb0Vlqsd0m6QpJHy44Sr3GqnS45O0qzaKW2n5LRLxQaKrazpN0bURcbnuWpP+0PT0ithcdbHdm+xSVivvEorPUq1Vm3E9KOnTIx5PL2yqOsT1WpT8pn21Kuuqy5Jbtd0iaL+nsiPhzk7KNpFbuCZKmS1pie6NKxy8XFfwEZZbv9SZJiyLiLxHxmKTfqFTkRcqSe66khZIUEcsl7aPSyZBaXaaf/1Zk+yhJ10iaHRFF90jdWqW4fyXpjbYPt92u0pOPi4aNWSTpQ+X3z5F0Z5SfXShQzdy2j5H07yqVdiscc5Vq5I6IFyNiUkR0RkSnSscBz46I3mLiSsr2M/ITlWbbsj1JpUMnjzYzZAVZcv+fpNMkyXaXSsXd19SUu2aRpA+WV5ccL+nFiHi66FC12J4i6UeSPhARvyk6zy4p+tnRIc/0nqXSDGmDpPnlbV9SqTCk0g/zTZLWS7pf0uuLzpwx988k/V7SyvJtUdGZs+QeNnaJCl5VkvF7bZUO8ayV9JCkOUVnzph7mqRlKq04WSnpnUVnLue6XtLTkv6i0l8zcyVdKOnCId/vK8tf10Ot8DOSMfc1kp4f8jvZW3Tmem+85B0AEtMqh0oAABlR3ACQGIobABJDcQNAYihuABilWie2GjZ21CcVo7gBYPSulZT1FAv/JGlhRByj0rr+q+p9MIobAEYpKpzYyvZU27fbXmH7bttHDgzXKE8qxrlKACAfPSq96Oe3to9TaWZ9qkonFbvD9sWS9pP0jnp3THEDQIPZHq/Seb9vGnL26b3Lb0d9UjGKGwAar03SCxFxdIX75qp8PDwiltseOKlY5nMZcYwbABosIl6S9Jjt90iDl3kbuETaqE8qxrlKAGCUbF+v0pkpJ6l0Url/kXSnpKslHSxpL0k3RMSXbE+T9F1J41V6ovKSiLijrsejuAEgLRwqAYDEUNwAkBiKGwASQ3EDQGIobgBIDMUNAImhuAEgMf8PzJBpgmBBlgYAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.scatter(filt_data_red[all_events_resh[:,0] == 1, 0], filt_data_red[all_events_resh[:,0] == 1, 1])\n",
    "plt.scatter(filt_data_red[all_events_resh[:,1] == 1, 0], filt_data_red[all_events_resh[:,1] == 1, 1])\n",
    "plt.scatter(filt_data_red[all_events_resh[:,2] == 1, 0], filt_data_red[all_events_resh[:,2] == 1, 1])\n",
    "plt.scatter(filt_data_red[all_events_resh[:,3] == 1, 0], filt_data_red[all_events_resh[:,3] == 1, 1])\n",
    "plt.scatter(filt_data_red[all_events_resh[:,4] == 1, 0], filt_data_red[all_events_resh[:,4] == 1, 1])\n",
    "plt.scatter(filt_data_red[all_events_resh[:,5] == 1, 0], filt_data_red[all_events_resh[:,5] == 1, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modeling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# split of the data\n",
    "X_train, X_test, y_train, y_test = train_test_split(filt_data_red, all_events_resh,\\\n",
    "                                         test_size = 0.33, shuffle = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.5015816548374611\n",
      "[[227893      2]\n",
      " [  6285     20]]\n",
      "0.5021839332037699\n",
      "[[227838      9]\n",
      " [  6325     28]]\n",
      "0.5020450496683306\n",
      "[[227849      1]\n",
      " [  6324     26]]\n",
      "0.5004700720777185\n",
      "[[227818      0]\n",
      " [  6376      6]]\n",
      "0.5048009615038914\n",
      "[[227503    195]\n",
      " [  6434     68]]\n",
      "0.5064005635338957\n",
      "[[227395    311]\n",
      " [  6402     92]]\n",
      "CPU times: user 5min 52s, sys: 402 ms, total: 5min 53s\n",
      "Wall time: 5min 54s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "abc = AdaBoostClassifier()\n",
    "\n",
    "all_pred = []\n",
    "all_labels = []\n",
    "for i in range(6):\n",
    "\n",
    "    abc.fit(X_train, y_train[:,i])\n",
    "    y_pred = abc.predict(X_test)\n",
    "\n",
    "    all_pred.append(y_pred)\n",
    "    all_labels.append(y_test[:,i])\n",
    "    print(roc_auc_score(y_test[:,i], y_pred))\n",
    "    print(confusion_matrix(y_test[:,i], y_pred))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.5712988484634542\n",
      "[[ 89209 138686]\n",
      " [  1569   4736]]\n",
      "0.5357646342979414\n",
      "[[ 70740 157107]\n",
      " [  1518   4835]]\n",
      "0.5398906933868289\n",
      "[[ 64717 163133]\n",
      " [  1297   5053]]\n",
      "0.5683459761359975\n",
      "[[ 62697 165121]\n",
      " [   884   5498]]\n",
      "0.6282825974068698\n",
      "[[ 63077 164621]\n",
      " [   133   6369]]\n",
      "0.6396764403905532\n",
      "[[ 68905 158801]\n",
      " [   151   6343]]\n",
      "CPU times: user 16 s, sys: 48.9 ms, total: 16.1 s\n",
      "Wall time: 16.1 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "lr = LogisticRegression(class_weight='balanced')\n",
    "\n",
    "all_pred = []\n",
    "all_labels = []\n",
    "for i in range(6):\n",
    "\n",
    "    lr.fit(X_train, y_train[:,i])\n",
    "    y_pred = lr.predict(X_test)\n",
    "\n",
    "    all_pred.append(y_pred)\n",
    "    all_labels.append(y_test[:,i])\n",
    "    print(roc_auc_score(y_test[:,i], y_pred))\n",
    "    print(confusion_matrix(y_test[:,i], y_pred))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/cluster/apps/python/3.6.1/x86_64/lib64/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.7815740181416588\n",
      "[[227810     85]\n",
      " [  2752   3553]]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/cluster/apps/python/3.6.1/x86_64/lib64/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.7835624299156497\n",
      "[[227845      2]\n",
      " [  2750   3603]]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/cluster/apps/python/3.6.1/x86_64/lib64/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.7865354330708662\n",
      "[[227850      0]\n",
      " [  2711   3639]]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/cluster/apps/python/3.6.1/x86_64/lib64/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8096969591358668\n",
      "[[227817      1]\n",
      " [  2429   3953]]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/cluster/apps/python/3.6.1/x86_64/lib64/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9251682147781864\n",
      "[[227589    109]\n",
      " [   970   5532]]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/cluster/apps/python/3.6.1/x86_64/lib64/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9337942375789381\n",
      "[[227605    101]\n",
      " [   857   5637]]\n",
      "CPU times: user 2min 6s, sys: 346 ms, total: 2min 6s\n",
      "Wall time: 2min 6s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "lda = LDA()\n",
    "rf = RandomForestClassifier(class_weight = 'balanced')\n",
    "lr = LogisticRegression(class_weight = 'balanced')\n",
    "\n",
    "eclf = VotingClassifier(estimators=[('lda', lda), ('rf', rf), ('lr', lr)], voting = 'soft', weights=[1,1,1])\n",
    "\n",
    "all_pred = []\n",
    "all_labels = []\n",
    "for i in range(6):\n",
    "\n",
    "    eclf.fit(X_train, y_train[:,i])\n",
    "    y_pred = eclf.predict(X_test)\n",
    "\n",
    "    all_pred.append(y_pred)\n",
    "    all_labels.append(y_test[:,i])\n",
    "    print(roc_auc_score(y_test[:,i], y_pred))\n",
    "    print(confusion_matrix(y_test[:,i], y_pred))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>HandStart</th>\n",
       "      <th>FirstDigitTouch</th>\n",
       "      <th>BothStartLoadPhase</th>\n",
       "      <th>LiftOff</th>\n",
       "      <th>Replace</th>\n",
       "      <th>BothReleased</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>subj10_series1_0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>subj10_series1_100</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200</th>\n",
       "      <td>subj10_series1_200</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>300</th>\n",
       "      <td>subj10_series1_300</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>400</th>\n",
       "      <td>subj10_series1_400</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>500</th>\n",
       "      <td>subj10_series1_500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>600</th>\n",
       "      <td>subj10_series1_600</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>700</th>\n",
       "      <td>subj10_series1_700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>800</th>\n",
       "      <td>subj10_series1_800</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>900</th>\n",
       "      <td>subj10_series1_900</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1000</th>\n",
       "      <td>subj10_series1_1000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1100</th>\n",
       "      <td>subj10_series1_1100</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1200</th>\n",
       "      <td>subj10_series1_1200</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1300</th>\n",
       "      <td>subj10_series1_1300</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1400</th>\n",
       "      <td>subj10_series1_1400</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1500</th>\n",
       "      <td>subj10_series1_1500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1600</th>\n",
       "      <td>subj10_series1_1600</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1700</th>\n",
       "      <td>subj10_series1_1700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1800</th>\n",
       "      <td>subj10_series1_1800</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900</th>\n",
       "      <td>subj10_series1_1900</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>subj10_series1_2000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2100</th>\n",
       "      <td>subj10_series1_2100</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2200</th>\n",
       "      <td>subj10_series1_2200</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2300</th>\n",
       "      <td>subj10_series1_2300</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2400</th>\n",
       "      <td>subj10_series1_2400</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2500</th>\n",
       "      <td>subj10_series1_2500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2600</th>\n",
       "      <td>subj10_series1_2600</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2700</th>\n",
       "      <td>subj10_series1_2700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2800</th>\n",
       "      <td>subj10_series1_2800</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2900</th>\n",
       "      <td>subj10_series1_2900</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259400</th>\n",
       "      <td>subj10_series1_259400</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259500</th>\n",
       "      <td>subj10_series1_259500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259600</th>\n",
       "      <td>subj10_series1_259600</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259700</th>\n",
       "      <td>subj10_series1_259700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",