neural_nets_intro.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore', category=FutureWarning)\n",
    "#from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Neural Networks\n",
    "\n",
    "## TO DO: Almost all the figues and schematics will be replaced or improved slowly\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/neural_net_ex.svg\" width=\"700\"/>\n",
    "<figcaption>A 3 layer Neural Network (By convention the input layer is not counted).</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## History of Neural networks\n",
    "\n",
    "**TODO: Make it more complete and format properly**\n",
    "\n",
    "1943 - Threshold Logic\n",
    "\n",
    "1940s - Hebbian Learning\n",
    "\n",
    "1958 - Perceptron\n",
    "\n",
    "1975 - Backpropagation\n",
    "\n",
    "1980s - Neocognitron\n",
    "\n",
    "1982: Hopfield Network\n",
    "\n",
    "1986: Convolutional Neural Networks\n",
    "\n",
    "1997: Long-short term memory (LSTM) model\n",
    "\n",
    "2014: Gated Recurrent Units, Generative Adversarial Networks(Check)?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why the boom now?\n",
    "* Data\n",
    "* Data\n",
    "* Data\n",
    "* Availability of GPUs\n",
    "* Algorithmic developments which allow for efficient training and training for deeper networks\n",
    "* Much easier access than a decade ago"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building blocks\n",
    "### Perceptron\n",
    "\n",
    "The smallest unit of a neural network is a **perceptron** like node.\n",
    "\n",
    "**What is a Perceptron?**\n",
    "\n",
    "It is a simple function which can have multiple inputs and has a single output.\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/perceptron_ex.svg\" width=\"400\"/>\n",
    "<figcaption>A simple perceptron with 3 inputs and 1 output.</figcaption>\n",
    "</figure>\n",
    "</center>\n",
    "\n",
    "\n",
    "It works as follows: \n",
    "\n",
    "Step 1: A **weighted sum** of the inputs is calculated\n",
    "\n",
    "\\begin{equation*}\n",
    "weighted\\_sum = \\sum_{k=1}^{num\\_inputs} w_{i} x_{i}\n",
    "\\end{equation*}\n",
    "\n",
    "Step 2: A **step** activation function is applied\n",
    "\n",
    "$$\n",
    "f(weighted\\_sum) = \\left\\{\n",
    "        \\begin{array}{ll}\n",
    "            0 & \\quad weighted\\_sum < threshold \\\\\n",
    "            1 & \\quad weighted\\_sum \\geq threshold\n",
    "        \\end{array}\n",
    "    \\right.\n",
    "$$\n",
    "\n",
    "You can see that this is also a linear classifier as we introduced in script 02."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%config IPCompleter.greedy=True\n",
    "import matplotlib as mpl\n",
    "mpl.rcParams['lines.linewidth'] = 3\n",
    "#mpl.rcParams['font.size'] = 16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "\n",
    "def perceptron(X, w, threshold=1):\n",
    "    # This function computes sum(w_i*x_i) and\n",
    "    # applies a perceptron activation\n",
    "    linear_sum = np.dot(X, w)\n",
    "    output = 0\n",
    "    if linear_sum >= threshold:\n",
    "        output = 1\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Boolean AND\n",
    "\n",
    "| x$_1$ | x$_2$ | output |\n",
    "| --- | --- | --- |\n",
    "| 0 | 0 | 0 |\n",
    "| 1 | 0 | 0 |\n",
    "| 0 | 1 | 0 |\n",
    "| 1 | 1 | 1 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculating Boolean AND using a perceptron\n",
    "threshold = 1.5\n",
    "w = [1, 1]\n",
    "X = [[0, 0], [1, 0], [0, 1], [1, 1]]\n",
    "for i in X:\n",
    "    print(\"Perceptron output for x1, x2 = \", i,\n",
    "          \" is \", perceptron(i, w, threshold))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def perceptron_DB(X, w, threshold):\n",
    "    # Plotting the decision boundary\n",
    "    for i in X:\n",
    "        plt.plot(i, \"o\", color=\"b\")\n",
    "    plt.xlim(-1, 2)\n",
    "    plt.ylim(-1, 2)\n",
    "    # The decision boundary is a line given by\n",
    "    # w_1*x_1+w_2*x_2-threshold=0\n",
    "    x1 = np.arange(-3, 4)\n",
    "    x2 = (threshold - x1*w[0])/w[1]\n",
    "    plt.plot(x1, x2, \"--\", color=\"black\")\n",
    "    plt.xlabel(\"x$_1$\", fontsize=16)\n",
    "    plt.ylabel(\"x$_2$\", fontsize=16)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "perceptron_DB(X, w, threshold)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise 1 : Compute a Boolean \"OR\" using a perceptron?**\n",
    "\n",
    "Hint: copy the code from the \"AND\" example and edit the weights and/or threshold"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Boolean OR\n",
    "\n",
    "| x$_1$ | x$_2$ | output |\n",
    "| --- | --- | --- |\n",
    "| 0 | 0 | 0 |\n",
    "| 1 | 0 | 1 |\n",
    "| 0 | 1 | 1 |\n",
    "| 1 | 1 | 1 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculating Boolean OR using a perceptron\n",
    "# Edit the code below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution\n",
    "# Calculating Boolean OR using a perceptron\n",
    "threshold=0.6\n",
    "w=[1,1]\n",
    "X=[[0,0],[1,0],[0,1],[1,1]]\n",
    "for i in X:\n",
    "    print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))\n",
    "# Plotting the decision boundary\n",
    "perceptron_DB(X,w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise 2 : Create a NAND gate using a perceptron**\n",
    "\n",
    "#### Boolean NAND\n",
    "\n",
    "| x$_1$ | x$_2$ | output |\n",
    "| --- | --- | --- |\n",
    "| 0 | 0 | 1 |\n",
    "| 1 | 0 | 1 |\n",
    "| 0 | 1 | 1 |\n",
    "| 1 | 1 | 0 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculating Boolean NAND using a perceptron"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution\n",
    "# Calculating Boolean OR using a perceptron\n",
    "import matplotlib.pyplot as plt\n",
    "threshold=-1.5\n",
    "w=[-1,-1]\n",
    "X=[[0,0],[1,0],[0,1],[1,1]]\n",
    "for i in X:\n",
    "    print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))\n",
    "# Plotting the decision boundary\n",
    "perceptron_DB(X,w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n",
    "However, it cannot compute some other boolean functions such as \"XOR\"\n",
    "\n",
    "**WHAT CAN WE DO?**\n",
    "\n",
    "\n",
    "Hint: Think about what is the significance of the NAND gate we created above?\n",
    "\n",
    "We said a single perceptron can't compute these functions. We didn't say that about **multiple Perceptrons**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**XOR function using multiple perceptrons**\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/perceptron_XOR.svg\" width=\"400\"/>\n",
    "<figcaption>Multiple perceptrons put together to output a XOR function.</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### Multi-layer perceptrons\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Google Playground\n",
    "\n",
    "https://playground.tensorflow.org/\n",
    "\n",
    "<img src=\"./images/neuralnets/google_playground.png\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning\n",
    "\n",
    "Now we know that we can compute complex functions if we stack together a number of perceptrons.\n",
    "\n",
    "However, we definitely **DO NOT** want to set the weights and thresholds by hand as we did in the examples above.\n",
    "\n",
    "We want some algorithm to do this for us!\n",
    "\n",
    "In order to achieve this we first need to choose a loss function for the problem at hand\n",
    "\n",
    "\n",
    "### Loss function\n",
    "In order to learn using an algorithm for learning we need to define a quantity which allows us to measure how far are the predictions of our network/setup are from the reality. This is done by choosing a so-called \"Loss function\" (as in the case for other machine learning algorithms). In other words this function measures how close are the predictions of our network to the supplied labels. Once we have this function we need an algorithm to update the weights of the network such that this loss decreases. As one can already imagine the choice of an appropriate loss function is very important to the success of the model. Fortunately, for classification and regression (which cover a large variety of probelms) these loss functions are well known. \n",
    "\n",
    "Generally **crossentropy** and **mean squared error** loss functions are used for classification and regression problems, respectively.\n",
    "\n",
    "### Gradient based learning\n",
    "As mentioned above, once we have decided upon a loss function we want to solve an **optimization problem** which minimizes this loss by updating the weights of the network. This is how the learning actually happens.\n",
    "\n",
    "The most popular optimization methods used in Neural Network training are some sort of **Gradient-descent** type methods, for e.g. gradient-descent, RMSprop, adam etc. \n",
    "**Gradient-descent** uses partial derivatives of the loss function with respect to the network weights and a learning rate to updates the weights such that the loss function decreases and hopefully after some iterations reaches its (Global) minimum.\n",
    "\n",
    "First, the loss function and its derivative are computed at the output node and this signal is propogated backwards, using chain rule, in the network to compute the partial derivatives. Hence, this method is called **Backpropagation**.\n",
    "\n",
    "Depending of\n",
    "\n",
    "\n",
    "\n",
    "### Activation Functions\n",
    "\n",
    "In order to train the network we need to change Perceptron's **step** activation function as it does not allow training using the back-propagation algorithm among other drawbacks.\n",
    "\n",
    "Non-Linear functions such as:\n",
    "\n",
    "* ReLU (Rectified linear unit)\n",
    "\n",
    "\\begin{equation*}\n",
    "f(z) = \\mathrm{max}(0,z)\n",
    "\\end{equation*}\n",
    "\n",
    "* Sigmoid\n",
    "\n",
    "\\begin{equation*}\n",
    "f(z) = \\frac{1}{1+e^{-z}}\n",
    "\\end{equation*}\n",
    "\n",
    "* tanh\n",
    "\n",
    "\\begin{equation*}\n",
    "f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\n",
    "\\end{equation*}\n",
    "\n",
    "\n",
    "are some of the most popular choices used as activation functions.\n",
    "\n",
    "Linear activations are **NOT** used because it can be mathematically shown that if linear activations are used then output is just a linear function of the input. So adding any number of hidden layers does not help to learn interesting functions.\n",
    "\n",
    "Non-linear activation functions allow the network to learn more complex representations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "plt.figure(figsize=(10, 4))\n",
    "\n",
    "pts=np.arange(-20,20, 0.1)\n",
    "\n",
    "plt.subplot(1, 3, 1)\n",
    "# Sigmoid\n",
    "plt.plot(pts, 1/(1+np.exp(-pts))) ;\n",
    "\n",
    "plt.subplot(1, 3, 2)\n",
    "# tanh\n",
    "plt.plot(pts, np.tanh(pts*np.pi)) ;\n",
    "\n",
    "# Rectified linear unit (ReLu)\n",
    "plt.subplot(1, 3, 3)\n",
    "pts_relu=[max(0,i) for i in pts];\n",
    "plt.plot(pts, pts_relu) ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is Keras?\n",
    "\n",
    "* It is a high level API to create and work with neural networks\n",
    "* Supports multiple backends such as TensorFlow from Google, Theano (Although Theano is dead now) and CNTK (Microsoft Cognitive Toolkit)\n",
    "* Very good for creating neural nets very quickly and hides away a lot of tedious work\n",
    "* Has been incorporated into official TensorFlow (which obviously only works with tensforflow) and as of TensorFlow 2.0 this will the main api to use TensorFlow (check reference)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Say hello to keras\n",
    "\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Activation\n",
    "\n",
    "# Creating a model\n",
    "model = Sequential()\n",
    "\n",
    "# Adding layers to this model\n",
    "# 1st Hidden layer\n",
    "# A Dense/fully-connected layer which takes as input a \n",
    "# feature array of shape (samples, num_features)\n",
    "# Here input_shape = (8,) means that the layer expects an input with num_features = 8 \n",
    "# and the sample size could be anything\n",
    "# Then we specify an activation function\n",
    "model.add(Dense(units=4, input_shape=(8,)))\n",
    "model.add(Activation(\"relu\"))\n",
    "\n",
    "# 2nd Hidden layer\n",
    "# This is also a fully-connected layer and we do not need to specify the\n",
    "# shape of the input anymore (We need to do that only for the first layer)\n",
    "# NOTE: Now\n",
    " we didn't add the activation seperately. Instead we just added it\n",
    "# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
    "model.add(Dense(units=4, activation=\"relu\"))\n",
    "\n",
    "          \n",
    "# The output layer\n",
    "model.add(Dense(units=1))\n",
    "model.add(Activation(\"sigmoid\"))\n",
    "\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### XOR using neural networks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.model_selection import train_test_split\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating a network to solve the XOR problem\n",
    "\n",
    "# Loading and plotting the data\n",
    "xor = pd.read_csv(\"xor.csv\")\n",
    "\n",
    "# Using x and y coordinates as featues\n",
    "features = xor.iloc[:, :-1]\n",
    "# Convert boolean to integer values (True->1 and False->0)\n",
    "labels = xor.iloc[:, -1].astype(int)\n",
    "\n",
    "colors = [[\"steelblue\", \"chocolate\"][i] for i in xor[\"label\"]]\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.xlim([-2, 2])\n",
    "plt.ylim([-2, 2])\n",
    "plt.title(\"Blue points are False\")\n",
    "\n",
    "\n",
    "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\") ;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Building a Keras model\n",
    "\n",
    "def a_simple_NN():\n",
    "    \n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(4, input_shape = (2,), activation = \"relu\"))\n",
    "\n",
    "    model.add(Dense(4, activation = \"relu\"))\n",
    "\n",
    "    model.add(Dense(1, activation = \"sigmoid\"))\n",
    "\n",
    "    model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Instantiating the model\n",
    "model = a_simple_NN()\n",
    "\n",
    "# Splitting the dataset into training (70%) and validation sets (30%)\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    features, labels, test_size=0.3)\n",
    "\n",
    "# Setting the number of passes through the entire training set\n",
    "num_epochs = 300\n",
    "\n",
    "# We can pass validation data while training\n",
    "model_run = model.fit(X_train, y_train, epochs=num_epochs,\n",
    "                      validation_data=(X_test, y_test))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Looking at the loss and accuracy on the training and validation sets during the training\n",
    "# This can be done by using Keras callback \"history\" which is applied by default\n",
    "history_model = model_run.history\n",
    "\n",
    "print(\"The history has the following data: \", history_model.keys())\n",
    "\n",
    "# Plotting the training and validation accuracy during the training\n",
    "plt.plot(np.arange(1, num_epochs+1), history_model[\"acc\"], \"blue\") ;\n",
    "\n",
    "plt.plot(np.arange(1, num_epochs+1), history_model[\"val_acc\"], \"red\") ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Here we dont't really see a big difference between the training and validation data because the function we are trying to fit is quiet simple and there is not too much noise. We will come back to these curves in a later example**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the example above we splitted our dataset into a 70-30 train-validation set. We know from previous chapters that to more robustly calculate accuracy we can use **K-fold crossvalidation**.\n",
    "This is even more important when we have small datasets and cannot afford to reserve a validation set!\n",
    "\n",
    "One way to do the cross validation here would be to write our own function to do this. However, we also know that **SciKit learn** provides several handy functions to evaluate and tune the models. So the question is:\n",
    "\n",
    "Can we somehow use these **Scikit learn** functions or ones we wrote ourselves for **Scikit learn** models to evaluate and tune our Keras models?\n",
    "\n",
    "The Answer is **YES !**\n",
    "\n",
    "We show how to do this in the following section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using SciKit learn functions on Keras models\n",
    "\n",
    "Keras offers 2 wrappers which allow its Sequential models to be used with SciKit learn. \n",
    "\n",
    "There are: **KerasClassifier** and **KerasRegressor**.\n",
    "\n",
    "For more information:\n",
    "https://keras.io/scikit-learn-api/\n",
    "\n",
    "**Now lets see how this works!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We wrap the Keras model we created above with KerasClassifier\n",
    "from keras.wrappers.scikit_learn import KerasClassifier\n",
    "from sklearn.model_selection import cross_val_score\n",
    "# Wrapping Keras model\n",
    "# NOTE: We pass verbose=0 to suppress the model output\n",
    "num_epochs = 400\n",
    "model_scikit = KerasClassifier(\n",
    "    build_fn=a_simple_NN, **{\"epochs\": num_epochs, \"verbose\": 0})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's reuse the function to visualize the decision boundary which we saw in chapter 2 with minimal change\n",
    "\n",
    "def list_flatten(list_of_list):\n",
    "    flattened_list = [i for j in list_of_list for i in j]\n",
    "    return flattened_list\n",
    "\n",
    "def plot_points(plt=plt, marker='o'):\n",
    "    colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
    "    plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker=marker);\n",
    "\n",
    "def train_and_plot_decision_surface(\n",
    "    name, classifier, features_2d, labels, preproc=None, plt=plt, marker='o', N=400\n",
    "):\n",
    "\n",
    "    features_2d = np.array(features_2d)\n",
    "    xmin, ymin = features_2d.min(axis=0)\n",
    "    xmax, ymax = features_2d.max(axis=0)\n",
    "\n",
    "    x = np.linspace(xmin, xmax, N)\n",
    "    y = np.linspace(ymin, ymax, N)\n",
    "    points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n",
    "\n",
    "    if preproc is not None:\n",
    "        points_for_classifier = preproc.fit_transform(points)\n",
    "        features_2d = preproc.fit_transform(features_2d)\n",
    "    else:\n",
    "        points_for_classifier = points\n",
    "\n",
    "    classifier.fit(features_2d, labels, verbose=0)\n",
    "    predicted = classifier.predict(features_2d)\n",
    "    \n",
    "    if name == \"Neural Net\":\n",
    "        predicted = list_flatten(predicted)\n",
    "    \n",
    "    \n",
    "    if preproc is not None:\n",
    "        name += \" (w/ preprocessing)\"\n",
    "    print(name + \":\\t\", sum(predicted == labels), \"/\", len(labels), \"correct\")\n",
    "    \n",
    "    if name == \"Neural Net\":\n",
    "        classes = np.array(list_flatten(classifier.predict(points_for_classifier)), dtype=bool)\n",
    "    else:\n",
    "        classes = np.array(classifier.predict(points_for_classifier), dtype=bool)\n",
    "    plt.plot(\n",
    "        points[~classes][:, 0],\n",
    "        points[~classes][:, 1],\n",
    "        \"o\",\n",
    "        color=\"steelblue\",\n",
    "        markersize=1,\n",
    "        alpha=0.01,\n",
    "    )\n",
    "    plt.plot(\n",
    "        points[classes][:, 0],\n",
    "        points[classes][:, 1],\n",
    "        \"o\",\n",
    "        color=\"chocolate\",\n",
    "        markersize=1,\n",
    "        alpha=0.04,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, ax = plt.subplots(figsize=(6, 6))\n",
    "\n",
    "train_and_plot_decision_surface(\"Neural Net\", model_scikit, features, labels, plt=ax)\n",
    "plot_points(plt=ax)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Applying K-fold cross-validation\n",
    "# Here we pass the whole dataset, i.e. features and labels, instead of splitting it.\n",
    "num_folds = 5\n",
    "cross_validation = cross_val_score(\n",
    "    model_scikit, features, labels, cv=num_folds, verbose=0)\n",
    "\n",
    "print(\"The acuracy on the \", num_folds, \" validation folds:\", cross_validation)\n",
    "print(\"The Average acuracy on the \", num_folds, \" validation folds:\", np.mean(cross_validation))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NOTE: The above code took quiet long even though we used only 5  CV folds and the neural network and data size are very small!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hyperparameter optimization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We know from chapter 6 that there are 2 types of parameters which need to be tuned for a machine learning model.\n",
    "* Normal model parameters which can be learned for e.g. by gradient-descent\n",
    "* Hyperparameters\n",
    "\n",
    "In the model which we created above we made some arbitrary choices like which optimizer we use, what is its learning rate, number of hidden units and so on ...\n",
    "\n",
    "Now that we have the keras model wrapped as a scikit model we can use the grid search functions we have seen in chapter 6."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import GridSearchCV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "HP_grid = dict(epochs=[300, 500, 1000])\n",
    "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
    "search.fit(features, labels)\n",
    "print(search.best_score_, search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "HP_grid = {'epochs' : [10, 15, 30], \n",
    "           'batch_size' : [10, 20, 30] }\n",
    "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
    "search.fit(features, labels)\n",
    "print(search.best_score_, search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A more general model for further Hyperparameter optimization\n",
    "from keras import optimizers\n",
    "\n",
    "def a_simple_NN(activation='relu', num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
    "\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[0],\n",
    "                    input_shape=(2,), activation=activation))\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
    "\n",
    "    model.add(Dense(1, activation=\"sigmoid\"))\n",
    "\n",
    "    model.compile(loss=\"binary_crossentropy\", optimizer=optimizers.rmsprop(\n",
    "        lr=learning_rate), metrics=[\"accuracy\"])\n",
    "\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise: \n",
    "* Look at the model above and choose a couple of hyperparameters to optimize. \n",
    "* What function from SciKit learn other than GridSearchCV can we use for hyperparameter optimization? Use it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise: Create a neural network to classify the 2d points example from chapter 2 learned \n",
    "(Optional: As you create the model read a bit on the different keras commands we have used)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "circle = pd.read_csv(\"2d_points.csv\")\n",
    "# Using x and y coordinates as featues\n",
    "features = circle.iloc[:, :-1]\n",
    "# Convert boolean to integer values (True->1 and False->0)\n",
    "labels = circle.iloc[:, -1].astype(int)\n",
    "\n",
    "colors = [[\"steelblue\", \"chocolate\"][i] for i in circle[\"label\"]]\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.xlim([-2, 2])\n",
    "plt.ylim([-2, 2])\n",
    "\n",
    "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Insert Code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The examples above are not the ideal use problems one should use neural networks for. They are too simple and can be easily solved by classical machine learning algorithms. Below we show examples which are the more common applications of Neural Networks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handwritten Digits Classification\n",
    "### MNIST Dataset\n",
    "\n",
    "MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
    "\n",
    "\n",
    ">The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a >test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size->normalized and centered in a fixed-size image.\n",
    ">It is a good database for people who want to try learning techniques and pattern recognition methods on real-world >data while spending minimal efforts on preprocessing and formatting.\n",
    ">source: http://yann.lecun.com/exdb/mnist/\n",
    "\n",
    "The problem we want to solve using this dataset is: multi-class classification\n",
    "This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading the dataset in keras\n",
    "# Later you can explore and play with other datasets with come with Keras\n",
    "from keras.datasets import mnist\n",
    "\n",
    "# Loading the train and test data\n",
    "\n",
    "(X_train, y_train), (X_test, y_test) = mnist.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Looking at the dataset\n",
    "print(X_train.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "i=np.random.randint(0,X_train.shape[0])\n",
    "plt.imshow(X_train[i], cmap=\"gray_r\") ;\n",
    "print(\"This digit is: \" , y_train[i])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Look at the data values for a couple of images\n",
    "print(X_train[0].min(), X_train[1].max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data consists of values between 0-255 representing the **grayscale level**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The labels are the digit on the image\n",
    "print(y_train.shape)"
   ]
  },
  {