08_a-neural_networks.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
    "from numpy.random import seed\n",
    "seed(42)\n",
    "import tensorflow as tf\n",
    "tf.random.set_seed(36)\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib as mpl\n",
    "import seaborn as sns\n",
    "sns.set(style=\"darkgrid\")\n",
    "mpl.rcParams['lines.linewidth'] = 3\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "%config IPCompleter.greedy=True\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore', category=FutureWarning)\n",
    "from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 8: Introduction to Neural Networks\n",
    "\n",
    "\n",
    "\n",
    "<img src=\"./images/3042en.jpg\" title=\"made at imgflip.com\" width=35%/>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## History of Neural networks\n",
    "\n",
    "\n",
    "1943 - Threshold Logic\n",
    "\n",
    "1940s - Hebbian Learning\n",
    "\n",
    "1958 - Perceptron\n",
    "\n",
    "1980s - Neocognitron\n",
    "\n",
    "1982 - Hopfield Network\n",
    "\n",
    "1989 - Convolutional neural network (CNN) kernels trained via backpropagation\n",
    "\n",
    "1997 - Long-short term memory (LSTM) model\n",
    "\n",
    "1998 - LeNet-5\n",
    "\n",
    "2014 - Gated Recurrent Units (GRU), Generative Adversarial Networks (GAN)\n",
    "\n",
    "2015 - ResNet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why the boom now?\n",
    "* Data\n",
    "* Data\n",
    "* Data\n",
    "* Availability of GPUs\n",
    "* Algorithmic developments which allow for efficient training and making networks networks\n",
    "* Development of high-level libraries/APIs have made the field much more accessible than it was a decade ago"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feed-Forward neural network\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/neural_net_ex.svg\" width=\"700\"/>\n",
    "<figcaption>A 3 layer densely connected Neural Network (By convention the input layer is not counted).</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building blocks\n",
    "### Perceptron\n",
    "\n",
    "The smallest unit of a neural network is a **perceptron** like node.\n",
    "\n",
    "**What is a Perceptron?**\n",
    "\n",
    "It is a simple function which can have multiple inputs and has a single output.\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/perceptron_ex.svg\" width=\"400\"/>\n",
    "<figcaption>A simple perceptron with 3 inputs and 1 output.</figcaption>\n",
    "</figure>\n",
    "</center>\n",
    "\n",
    "\n",
    "It works as follows: \n",
    "\n",
    "Step 1: A **weighted sum** of the inputs is calculated\n",
    "\n",
    "\\begin{equation*}\n",
    "weighted\\_sum = w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3} + ...\n",
    "\\end{equation*}\n",
    "\n",
    "Step 2: A **step** activation function is applied\n",
    "\n",
    "$$\n",
    "f = \\left\\{\n",
    "        \\begin{array}{ll}\n",
    "            0 & \\quad weighted\\_sum < threshold \\\\\n",
    "            1 & \\quad weighted\\_sum \\geq threshold\n",
    "        \\end{array}\n",
    "    \\right.\n",
    "$$\n",
    "\n",
    "You can see that this is also a linear classifier as the ones we introduced in script 02."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plotting the step function\n",
    "x = np.arange(-2,2.1,0.01)\n",
    "y = np.zeros(len(x))\n",
    "threshold = 0.\n",
    "y[x>threshold] = 1.\n",
    "step_plot = sns.lineplot(x, y).set_title('Step function') ;\n",
    "plt.xlabel('weighted_sum') ;\n",
    "plt.ylabel('f(weighted_sum)') ;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def perceptron(X, w, threshold=1):\n",
    "    # This function computes sum(w_i*x_i) and\n",
    "    # applies a perceptron activation\n",
    "    linear_sum = np.dot(np.asarray(X).T, w)\n",
    "    output = np.zeros(len(linear_sum), dtype=np.int8)\n",
    "    output[linear_sum >= threshold] = 1\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Boolean AND\n",
    "\n",
    "| x$_1$ | x$_2$ | output |\n",
    "| --- | --- | --- |\n",
    "| 0 | 0 | 0 |\n",
    "| 1 | 0 | 0 |\n",
    "| 0 | 1 | 0 |\n",
    "| 1 | 1 | 1 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculating Boolean AND using a perceptron\n",
    "threshold = 1.5\n",
    "# (w1, w2)\n",
    "w = [1, 1]\n",
    "# (x1, x2) pairs\n",
    "x1 = [0, 1, 0, 1]\n",
    "x2 = [0, 0, 1, 1]\n",
    "# Calling the perceptron function\n",
    "output = perceptron([x1, x2], w, threshold)\n",
    "for i in range(len(output)):\n",
    "    print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n",
    "          \" is \", output[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def perceptron_DB(x1, x2, w, threshold):\n",
    "    # Plotting the decision boundary of the perceptron\n",
    "    plt.scatter(x1, x2, color=\"black\")\n",
    "    plt.xlim(-1,2)\n",
    "    plt.ylim(-1,2)\n",
    "    # The decision boundary is a line given by\n",
    "    # w_1*x_1+w_2*x_2-threshold=0\n",
    "    x1 = np.arange(-3, 4)\n",
    "    x2 = (threshold - x1*w[0])/w[1]\n",
    "    sns.lineplot(x1, x2, **{\"color\": \"black\"})\n",
    "    plt.xlabel(\"x$_1$\", fontsize=16)\n",
    "    plt.ylabel(\"x$_2$\", fontsize=16)\n",
    "    # Coloring the regions\n",
    "    pts_tmp = np.arange(-2, 2.1, 0.02)\n",
    "    points = np.array(np.meshgrid(pts_tmp, pts_tmp)).T.reshape(-1, 2)\n",
    "    outputs = perceptron(points.T, w, threshold)\n",
    "    plt.plot(points[:, 0][outputs == 0], points[:, 1][outputs == 0],\n",
    "             \"o\",\n",
    "             color=\"steelblue\",\n",
    "             markersize=1,\n",
    "             alpha=0.04,\n",
    "             )\n",
    "    plt.plot(points[:, 0][outputs == 1], points[:, 1][outputs == 1],\n",
    "             \"o\",\n",
    "             color=\"chocolate\",\n",
    "             markersize=1,\n",
    "             alpha=0.04,\n",
    "             )\n",
    "    plt.title(\"Blue color = 0 and Chocolate = 1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plotting the perceptron decision boundary\n",
    "perceptron_DB(x1, x2, w, threshold)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise section\n",
    "* Compute a Boolean \"OR\" using a perceptron\n",
    "\n",
    "Hint: copy the code from the \"AND\" example and edit the weights and/or threshold"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Boolean OR\n",
    "\n",
    "| x$_1$ | x$_2$ | output |\n",
    "| --- | --- | --- |\n",
    "| 0 | 0 | 0 |\n",
    "| 1 | 0 | 1 |\n",
    "| 0 | 1 | 1 |\n",
    "| 1 | 1 | 1 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculating Boolean OR using a perceptron\n",
    "# Enter code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "# Solution\n",
    "# Calculating Boolean OR using a perceptron\n",
    "threshold=0.6\n",
    "# (w1, w2)\n",
    "w=[1,1]\n",
    "# (x1, x2) pairs\n",
    "x1 = [0, 1, 0, 1]\n",
    "x2 = [0, 0, 1, 1]\n",
    "output = perceptron([x1, x2], w, threshold)\n",
    "for i in range(len(output)):\n",
    "    print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n",
    "          \" is \", output[i])\n",
    "perceptron_DB(x1, x2, w, threshold)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise section\n",
    "* Create a NAND gate using a perceptron\n",
    "\n",
    "Boolean NAND\n",
    "\n",
    "| x$_1$ | x$_2$ | output |\n",
    "| --- | --- | --- |\n",
    "| 0 | 0 | 1 |\n",
    "| 1 | 0 | 1 |\n",
    "| 0 | 1 | 1 |\n",
    "| 1 | 1 | 0 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculating Boolean NAND using a perceptron\n",
    "# Enter code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "# Solution\n",
    "# Calculating Boolean NAND using a perceptron\n",
    "import matplotlib.pyplot as plt\n",
    "threshold=-1.5\n",
    "# (w1, w2)\n",
    "w=[-1,-1]\n",
    "# (x1, x2) pairs\n",
    "x1 = [0, 1, 0, 1]\n",
    "x2 = [0, 0, 1, 1]\n",
    "output = perceptron([x1, x2], w, threshold)\n",
    "for i in range(len(output)):\n",
    "    print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n",
    "          \" is \", output[i])\n",
    "perceptron_DB(x1, x2, w, threshold)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n",
    "\n",
    "However, it cannot compute some other boolean functions such as \"XOR\".\n",
    "\n",
    "**WHAT CAN WE DO?**\n",
    "\n",
    "\n",
    "Hint: Think about what is the significance of the NAND gate we have created above?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multi-layer perceptrons\n",
    "\n",
    "\n",
    "Answer: We said a single perceptron can't compute a \"XOR\" function. We didn't say that about **multiple Perceptrons** put together.\n",
    "\n",
    "The normal densely connected neural network is sometimes also called \"Multi-layer\" perceptron.\n",
    "\n",
    "**XOR function using multiple perceptrons**\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/perceptron_XOR.svg\" width=\"400\"/>\n",
    "<figcaption>Multiple perceptrons connected together to output a XOR function.</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning\n",
    "\n",
    "We know that we can compute complicated functions by combining a number of perceptrons.\n",
    "\n",
    "In the perceptron examples we had set the model parameters (weights and threshold) by hand.\n",
    "\n",
    "This is something we definitely **DO NOT** want to do or even can do for big networks.\n",
    "\n",
    "We want some algorithm to set/learn the model parameters for us!\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "    <i class=\"fa fa-info-circle\"></i>&nbsp; <strong>Threshold -> bias</strong>  \n",
    "    \n",
    "Before we go further we need to introduce one change. The threshold which we saw in the step activation function above is moved to the left side of the equation and is called **bias**.\n",
    "\n",
    "$$\n",
    "f = \\left\\{\n",
    "        \\begin{array}{ll}\n",
    "            0 & \\quad weighted\\_sum + bias < 0 \\\\\n",
    "            1 & \\quad weighted\\_sum + bias \\geq 0\n",
    "        \\end{array}\n",
    "       \\quad \\quad  \\mathrm{where}, bias = -threshold\n",
    "    \\right.\n",
    "$$\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to algorithmically set/learn the weights and bias we need to choose an appropriate loss function for the problem at hand and solve an optimization problem.\n",
    "We will explain below what this means.\n",
    "\n",
    "\n",
    "### Loss function\n",
    "\n",
    "To learn using an algorithm we need to define a quantity/function which allows us to measure how close or far are the predictions of our network/setup from reality or the supplied labels. This is done by choosing a so-called \"Loss function\" (as in the case for other machine learning algorithms).\n",
    "\n",
    "Once we have this function, we need an algorithm to update the weights of the network such that this loss function decreases. \n",
    "As one can already imagine the choice of an appropriate loss function is critical to the success of the model. \n",
    "\n",
    "Fortunately, for classification and regression (which cover a large variety of problems) these loss functions are well known. \n",
    "\n",
    "**Crossentropy** and **mean squared error** loss functions are often used for standard classification and regression problems, respectively.\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "    <i class=\"fa fa-info-circle\"></i>&nbsp; As we have seen before, <strong>mean squared error</strong> is defined as \n",
    "\n",
    "\n",
    "$$\n",
    "\\frac{1}{n} \\left((y_1 - \\hat{y}_1)^2 + (y_2 - \\hat{y}_2)^2 + ... + (y_n - \\hat{y}_n)^2 \\right)\n",
    "$$\n",
    "\n",
    "\n",
    "</div>\n",
    "\n",
    "### Gradient based learning\n",
    "\n",
    "As mentioned above, once we have chosen a loss function, we want to solve an **optimization problem** which minimizes this loss by updating the parameters (weights and biases) of the network. This is how the learning takes in a NN, and the \"knowledge\" is stored as the weights and biases.\n",
    "\n",
    "The most popular optimization methods used in Neural Network training are **Gradient-descent (GD)** type methods, such as gradient-descent itself, RMSprop and Adam. \n",
    "\n",
    "**Gradient-descent** uses partial derivatives of the loss function with respect to the network weights and a learning rate to updates the weights such that the loss function decreases and after some iterations reaches its (Global) minimum value.\n",
    "\n",
    "First, the loss function and its derivative are computed at the output node, and this signal is propagated backwards, using the chain rule, in the network to compute the partial derivatives. Hence, this method is called **Backpropagation**.\n",
    "\n",
    "One way to perform a single GD pass is to compute the partial derivatives using **all the samples** in our data, computing average derivatives and using them to update the weights. This is called **Batch gradient descent**. However, in deep learning we mostly work with massive datasets and using batch gradient descent can make the training very slow!\n",
    "\n",
    "The other extreme is to randomly shuffle the dataset and advance a pass of GD with the gradients computed using only **one sample** at a time. This is called **Stochastic gradient descent**.\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/stochastic-vs-batch-gradient-descent.png\" width=\"600\"/>\n",
    "<figcaption>Source: <a href=\"https://wikidocs.net/3413\">https://wikidocs.net/3413</a></figcaption>\n",
    "</figure>\n",
    "</center>\n",
    "\n",
    "\n",
    "In practice, an approach in-between these two is used. The entire dataset is divided into **m batches** and these are used one by one to compute the derivatives and apply GD. This technique is called **Mini-batch gradient descent**. \n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "One pass through the entire training dataset is called 1 epoch of training.\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import numpy as np\n",
    "\n",
    "plt.figure(figsize=(10, 4)) ;\n",
    "\n",
    "pts=np.arange(-20,20, 0.1) ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Activation Functions\n",
    "\n",
    "In order to train the network we need to move away from Perceptron's **step** activation function because it can not be used for training using the gradient-descent and back-propagation algorithms among other drawbacks.\n",
    "\n",
    "Non-Linear functions such as:\n",
    "\n",
    "* Sigmoid\n",
    "\n",
    "\\begin{equation*}\n",
    "f(z) = \\frac{1}{1+e^{-z}} \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
    "\\end{equation*}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.lineplot(pts, 1/(1+np.exp(-pts))) ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* tanh\n",
    "\n",
    "\\begin{equation*}\n",
    "f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
    "\\end{equation*}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.lineplot(pts, np.tanh(pts*np.pi)) ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* **ReLU (Rectified linear unit)**\n",
    "\n",
    "\\begin{equation*}\n",
    "f(z) = \\mathrm{max}(0,z)   \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
    "\\end{equation*}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pts_relu=[max(0,i) for i in pts];\n",
    "plt.plot(pts, pts_relu) ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "are some of the commonly used as activation functions. Such non-linear activation functions allow the network to learn complex representations of data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "ReLU is very popular and is widely used nowadays. There also exist other variations of ReLU, e.g. \"leaky ReLU\".\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "Why don't we just use a simple linear activation function?\n",
    "    \n",
    "Linear activations are **NOT** used because it can be mathematically shown that if they are used then the output is just a linear function of the input. So we cannot learn interesting and complex functions by adding any number of hidden layers.\n",
    "\n",
    "The only exception when we do want to use a linear activation is for the output layer of a network when solving a regression problem.\n",
    "\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise section - Google Playground\n",
    "\n",
    "A great tool from Google to develop a feeling for the workings of neural networks.\n",
    "\n",
    "https://playground.tensorflow.org/\n",
    "\n",
    "<img src=\"./images/neuralnets/google_playground.png\"/>\n",
    "\n",
    "**Walkthrough by instructor**\n",
    "\n",
    "Some concepts to look at:\n",
    "\n",
    "* Simple vs Complex models (Effect of network size)\n",
    "* Optimization results\n",
    "* Effect of activation functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction to Keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is Keras?\n",
    "\n",
    "* It is a high level API to create and work with neural networks\n",
    "* Supports multiple backends such as **TensorFlow** from Google, **Theano** (Although Theano is dead now) and **CNTK** (Microsoft Cognitive Toolkit)\n",
    "* Very good for creating neural nets quickly and hides away a lot of tedious work\n",
    "* Has been incorporated into official TensorFlow (which obviously only works with tensforflow) and as of TensorFlow 2.0 this will the main api to use it\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/neural_net_keras_1.svg\" width=\"700\"/>\n",
    "<figcaption>Building this model in Keras</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Say hello to Tensorflow\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation\n",
    "\n",
    "# Creating a model\n",
    "model = Sequential()\n",
    "\n",
    "# Adding layers to this model\n",
    "# 1st Hidden layer\n",
    "# A Dense/fully-connected layer which takes as input a \n",
    "# feature array of shape (samples, num_features)\n",
    "# Here input_shape = (2,) means that the layer expects an input with num_features = 2\n",
    "# and the sample size could be anything\n",
    "# The activation function for this layer is set to \"relu\"\n",
    "model.add(Dense(units=4, input_shape=(2,), activation=\"relu\"))\n",
    "\n",
    "# 2nd Hidden layer\n",
    "# This is also a fully-connected layer and we do not need to specify the\n",
    "# shape of the input anymore (We need to do that only for the first layer)\n",
    "# NOTE: Now we didn't add the activation seperately. Instead we just added it\n",
    "# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
    "model.add(Dense(units=4, activation=\"relu\"))\n",
    "\n",
    "          \n",
    "# The output layer\n",
    "model.add(Dense(units=1))\n",
    "model.add(Activation(\"sigmoid\"))\n",
    "\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### XOR using neural networks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from sklearn.model_selection import train_test_split\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating a network to solve the XOR problem\n",
    "\n",
    "# Loading and plotting the data\n",
    "xor = pd.read_csv(\"data/xor.csv\")\n",
    "\n",
    "# Using x and y coordinates as featues\n",
    "features = xor.iloc[:, :-1]\n",
    "# Convert boolean to integer values (True->1 and False->0)\n",
    "labels = (1-xor.iloc[:, -1].astype(int))\n",
    "\n",
    "colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.xlim([-2, 2])\n",
    "plt.ylim([-2, 2])\n",
    "plt.title(\"Blue points are False\")\n",
    "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\") ;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Building a simple Tensorflow model\n",
    "\n",
    "def a_simple_NN():\n",
    "    \n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(4, input_shape = (2,), activation = \"relu\"))\n",
    "\n",
    "    model.add(Dense(4, activation = \"relu\"))\n",
    "\n",
    "    model.add(Dense(1, activation = \"sigmoid\"))\n",
    "\n",
    "    model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Instantiating the model\n",
    "model = a_simple_NN()\n",
    "\n",
    "# Splitting the dataset into training (70%) and validation sets (30%)\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    features, labels, test_size=0.3)\n",
    "\n",
    "# Setting the number of passes through the entire training set\n",
    "num_epochs = 300\n",
    "\n",
    "# model.fit() is used to train the model\n",
    "# We can pass validation data while training\n",
    "model_run = model.fit(X_train, y_train, epochs=num_epochs,\n",
    "                      validation_data=(X_test, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i>&nbsp;\n",
    "    NOTE: We can pass \"verbose=0\" to model.fit() to suppress the printing of model output on the terminal/notebook.\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plotting the loss and accuracy on the training and validation sets during the training\n",
    "# This can be done by using Keras callback \"history\" which is applied by default\n",
    "history_model = model_run.history\n",
    "\n",
    "print(\"The history has the following data: \", history_model.keys())\n",
    "\n",
    "# Plotting the training and validation accuracy during the training\n",
    "sns.lineplot(np.arange(1, num_epochs+1), history_model[\"accuracy\"], color = \"blue\", label=\"Training set\") ;\n",
    "sns.lineplot(np.arange(1, num_epochs+1), history_model[\"val_accuracy\"], color = \"red\", label=\"Valdation set\") ;\n",
    "plt.xlabel(\"epochs\") ;\n",
    "plt.ylabel(\"accuracy\") ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "The plots such as above are essential for analyzing the behaviour and performance of the network and to tune it in the right direction. However, for the example above we don't expect to derive a lot of insight from this plot as the function we are trying to fit is quite simple and there is not too much noise. We will see the significance of these curves in a later example.\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Before we move on forward we see how to save and load a keras model\n",
    "model.save(\"./data/my_first_NN.h5\")\n",
    "\n",
    "# Optional: See what is in the hdf5 file we just created above\n",
    "\n",
    "from tensorflow.keras.models import load_model\n",
    "model = load_model(\"./data/my_first_NN.h5\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the training and validation in the example above we split our dataset into a 70-30 train-validation set. We know from previous chapters that to more robustly estimate the accuracy of our model we can use **K-fold cross-validation**.\n",
    "This is even more important when we have small datasets and cannot afford to reserve a validation set!\n",
    "\n",
    "One way to do the cross-validation here would be to write our own function to do this. However, we also know that **scikit-learn** provides several handy functions to evaluate and tune the models. So the question is:\n",
    "\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "    Can we somehow use the scikit-learn functions or the ones we wrote ourselves for scikit-learn models to evaluate and tune our Keras models?\n",
    "\n",
    "\n",
    "The Answer is **YES !**\n",
    "</p>\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "We show how to do this in the following section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using scikit-learn functions on keras models\n",
    "\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "Keras offers 2 wrappers which allow its Sequential models to be used with scikit-learn. \n",
    "\n",
    "There are: **KerasClassifier** and **KerasRegressor**.\n",
    "\n",
    "For more information:\n",
    "https://keras.io/scikit-learn-api/\n",
    "</p>\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "**Now lets see how this works!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We wrap the Keras model we created above with KerasClassifier\n",
    "from tensorflow.keras.wrappers.scikit_learn import KerasClassifier\n",
    "from sklearn.model_selection import cross_val_score\n",
    "# Wrapping Keras model\n",
    "# NOTE: We pass verbose=0 to suppress the model output\n",
    "num_epochs = 400\n",
    "model_scikit = KerasClassifier(\n",
    "    build_fn=a_simple_NN, epochs=num_epochs, verbose=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's reuse the function to visualize the decision boundary which we saw in chapter 2 with minimal change\n",
    "\n",
    "def list_flatten(list_of_list):\n",
    "    flattened_list = [i for j in list_of_list for i in j]\n",
    "    return flattened_list\n",
    "\n",
    "def plot_points(plt=plt, marker='o'):\n",
    "    colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
    "    plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker=marker);\n",
    "\n",
    "def train_and_plot_decision_surface(\n",
    "    name, classifier, features_2d, labels, preproc=None, plt=plt, marker='o', N=400\n",
    "):\n",
    "\n",
    "    features_2d = np.array(features_2d)\n",
    "    xmin, ymin = features_2d.min(axis=0)\n",
    "    xmax, ymax = features_2d.max(axis=0)\n",
    "\n",
    "    x = np.linspace(xmin, xmax, N)\n",
    "    y = np.linspace(ymin, ymax, N)\n",
    "    points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n",
    "\n",
    "    if preproc is not None:\n",
    "        points_for_classifier = preproc.fit_transform(points)\n",
    "        features_2d = preproc.fit_transform(features_2d)\n",
    "    else:\n",
    "        points_for_classifier = points\n",
    "\n",
    "    classifier.fit(features_2d, labels, verbose=0)\n",
    "    predicted = classifier.predict(features_2d)\n",
    "    \n",
    "    if name == \"Neural Net\":\n",
    "        predicted = list_flatten(predicted)\n",
    "    \n",
    "    \n",
    "    if preproc is not None:\n",
    "        name += \" (w/ preprocessing)\"\n",
    "    print(name + \":\\t\", sum(predicted == labels), \"/\", len(labels), \"correct\")\n",
    "    \n",
    "    if name == \"Neural Net\":\n",
    "        classes = np.array(list_flatten(classifier.predict(points_for_classifier)), dtype=bool)\n",
    "    else:\n",
    "        classes = np.array(classifier.predict(points_for_classifier), dtype=bool)\n",
    "    plt.plot(\n",
    "        points[~classes][:, 0],\n",
    "        points[~classes][:, 1],\n",
    "        \"o\",\n",
    "        color=\"steelblue\",\n",
    "        markersize=1,\n",
    "        alpha=0.01,\n",
    "    )\n",
    "    plt.plot(\n",
    "        points[classes][:, 0],\n",
    "        points[classes][:, 1],\n",
    "        \"o\",\n",
    "        color=\"chocolate\",\n",
    "        markersize=1,\n",
    "        alpha=0.04,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, ax = plt.subplots(figsize=(6, 6))\n",
    "\n",
    "train_and_plot_decision_surface(\"Neural Net\", model_scikit, features, labels, plt=ax)\n",
    "plot_points(plt=ax)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,