Skip to content
Snippets Groups Projects
neural_nets_intro.ipynb 49.8 KiB
Newer Older
  • Learn to ignore specific revisions
  • chadhat's avatar
    chadhat committed
    {
     "cells": [
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
        "import matplotlib.pyplot as plt\n",
        "%matplotlib inline\n",
        "%config InlineBackend.figure_format = 'retina'\n",
        "import warnings\n",
        "warnings.filterwarnings('ignore', category=FutureWarning)\n",
        "#from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Introduction to Neural Networks\n",
        "\n",
        "## TO DO: Almost all the figues and schematics will be replaced or improved slowly\n",
        "\n",
    
        "<center>\n",
        "<figure>\n",
        "<img src=\"./images/neuralnets/neural_net_ex.svg\" width=\"700\"/>\n",
        "<figcaption>A 3 layer Neural Network (By convention the input layer is not counted).</figcaption>\n",
        "</figure>\n",
        "</center>"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## History of Neural networks\n",
        "\n",
        "**TODO: Make it more complete and format properly**\n",
        "\n",
        "1943 - Threshold Logic\n",
        "\n",
        "1940s - Hebbian Learning\n",
        "\n",
        "1958 - Perceptron\n",
        "\n",
        "1975 - Backpropagation\n",
        "\n",
        "1980s - Neocognitron\n",
        "\n",
        "1982: Hopfield Network\n",
        "\n",
        "1986: Convolutional Neural Networks\n",
        "\n",
        "1997: Long-short term memory (LSTM) model\n",
        "\n",
    
        "2014: Gated Recurrent Units, Generative Adversarial Networks(Check)?"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Why the boom now?\n",
        "* Data\n",
        "* Data\n",
        "* Data\n",
        "* Availability of GPUs\n",
        "* Algorithmic developments which allow for efficient training and training for deeper networks\n",
        "* Much easier access than a decade ago"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Building blocks\n",
        "### Perceptron\n",
        "\n",
    
        "The smallest unit of a neural network is a **perceptron** like node.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "**What is a Perceptron?**\n",
        "\n",
    
        "It is a simple function which can have multiple inputs and has a single output.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "<center>\n",
        "<figure>\n",
        "<img src=\"./images/neuralnets/perceptron_ex.svg\" width=\"400\"/>\n",
        "<figcaption>A simple perceptron with 3 inputs and 1 output.</figcaption>\n",
        "</figure>\n",
        "</center>\n",
        "\n",
        "\n",
        "It works as follows: \n",
        "\n",
        "Step 1: A **weighted sum** of the inputs is calculated\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "\\begin{equation*}\n",
        "weighted\\_sum = \\sum_{k=1}^{num\\_inputs} w_{i} x_{i}\n",
        "\\end{equation*}\n",
        "\n",
    
        "Step 2: A **step** activation function is applied\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "$$\n",
        "f(weighted\\_sum) = \\left\\{\n",
        "        \\begin{array}{ll}\n",
    
        "            0 & \\quad weighted\\_sum < threshold \\\\\n",
        "            1 & \\quad weighted\\_sum \\geq threshold\n",
    
    chadhat's avatar
    chadhat committed
        "        \\end{array}\n",
        "    \\right.\n",
        "$$\n",
    
        "\n",
        "You can see that this is also a linear classifier as we introduced in script 02."
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "%matplotlib inline\n",
    
        "%config IPCompleter.greedy=True\n",
    
    chadhat's avatar
    chadhat committed
        "import matplotlib as mpl\n",
    
        "mpl.rcParams['lines.linewidth'] = 3\n",
        "#mpl.rcParams['font.size'] = 16"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "import numpy as np\n",
    
        "import matplotlib.pyplot as plt\n",
        "\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "def perceptron(X, w, threshold=1):\n",
    
        "    # This function computes sum(w_i*x_i) and\n",
    
    chadhat's avatar
    chadhat committed
        "    # applies a perceptron activation\n",
    
        "    linear_sum = np.dot(X, w)\n",
        "    output = 0\n",
    
    chadhat's avatar
    chadhat committed
        "    if linear_sum >= threshold:\n",
        "        output = 1\n",
    
    chadhat's avatar
    chadhat committed
        "    return output"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "#### Boolean AND\n",
        "\n",
        "| x$_1$ | x$_2$ | output |\n",
        "| --- | --- | --- |\n",
        "| 0 | 0 | 0 |\n",
        "| 1 | 0 | 0 |\n",
        "| 0 | 1 | 0 |\n",
        "| 1 | 1 | 1 |"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Calculating Boolean AND using a perceptron\n",
    
        "threshold = 1.5\n",
    
        "w = [1, 1]\n",
        "X = [[0, 0], [1, 0], [0, 1], [1, 1]]\n",
    
    chadhat's avatar
    chadhat committed
        "for i in X:\n",
    
        "    print(\"Perceptron output for x1, x2 = \", i,\n",
        "          \" is \", perceptron(i, w, threshold))"
    
    chadhat's avatar
    chadhat committed
       ]
      },
    
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
    
    chadhat's avatar
    chadhat committed
        "def perceptron_DB(X, w, threshold):\n",
    
        "    # Plotting the decision boundary\n",
        "    for i in X:\n",
        "        plt.plot(i, \"o\", color=\"b\")\n",
        "    plt.xlim(-1, 2)\n",
        "    plt.ylim(-1, 2)\n",
        "    # The decision boundary is a line given by\n",
        "    # w_1*x_1+w_2*x_2-threshold=0\n",
        "    x1 = np.arange(-3, 4)\n",
        "    x2 = (threshold - x1*w[0])/w[1]\n",
        "    plt.plot(x1, x2, \"--\", color=\"black\")\n",
        "    plt.xlabel(\"x$_1$\", fontsize=16)\n",
        "    plt.ylabel(\"x$_2$\", fontsize=16)"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
    chadhat's avatar
    chadhat committed
        "perceptron_DB(X, w, threshold)"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "**Exercise 1 : Compute a Boolean \"OR\" using a perceptron?**\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "Hint: copy the code from the \"AND\" example and edit the weights and/or threshold"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "#### Boolean OR\n",
        "\n",
        "| x$_1$ | x$_2$ | output |\n",
        "| --- | --- | --- |\n",
        "| 0 | 0 | 0 |\n",
        "| 1 | 0 | 1 |\n",
        "| 0 | 1 | 1 |\n",
        "| 1 | 1 | 1 |"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "# Calculating Boolean OR using a perceptron\n",
        "# Edit the code below"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Solution\n",
        "# Calculating Boolean OR using a perceptron\n",
        "threshold=0.6\n",
        "w=[1,1]\n",
        "X=[[0,0],[1,0],[0,1],[1,1]]\n",
        "for i in X:\n",
    
        "    print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))\n",
    
    chadhat's avatar
    chadhat committed
        "# Plotting the decision boundary\n",
    
        "perceptron_DB(X,w)"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "**Exercise 2 : Create a NAND gate using a perceptron**\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "#### Boolean NAND\n",
        "\n",
        "| x$_1$ | x$_2$ | output |\n",
        "| --- | --- | --- |\n",
        "| 0 | 0 | 1 |\n",
        "| 1 | 0 | 1 |\n",
        "| 0 | 1 | 1 |\n",
        "| 1 | 1 | 0 |"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
    
        "# Calculating Boolean NAND using a perceptron"
    
    chadhat's avatar
    chadhat committed
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Solution\n",
        "# Calculating Boolean OR using a perceptron\n",
        "import matplotlib.pyplot as plt\n",
        "threshold=-1.5\n",
        "w=[-1,-1]\n",
        "X=[[0,0],[1,0],[0,1],[1,1]]\n",
        "for i in X:\n",
        "    print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))\n",
        "# Plotting the decision boundary\n",
        "perceptron_DB(X,w)"
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "In fact, a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n",
    
    chadhat's avatar
    chadhat committed
        "However, it cannot compute some other boolean functions such as \"XOR\"\n",
        "\n",
    
        "**WHAT CAN WE DO?**\n",
        "\n",
        "\n",
        "Hint: Think about what is the significance of the NAND gate we created above?\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "We said a single perceptron can't compute these functions. We didn't say that about **multiple Perceptrons**."
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "**XOR function using multiple perceptrons**\n",
        "\n",
        "<center>\n",
        "<figure>\n",
        "<img src=\"./images/neuralnets/perceptron_XOR.svg\" width=\"400\"/>\n",
        "<figcaption>Multiple perceptrons put together to output a XOR function.</figcaption>\n",
        "</figure>\n",
        "</center>"
    
    chadhat's avatar
    chadhat committed
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "### Multi-layer perceptrons\n"
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "### Google Playground\n",
        "\n",
        "https://playground.tensorflow.org/\n",
        "\n",
        "<img src=\"./images/neuralnets/google_playground.png\"/>"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Learning\n",
        "\n",
        "Now we know that we can compute complex functions if we stack together a number of perceptrons.\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "However, we definitely **DO NOT** want to set the weights and thresholds by hand as we did in the examples above.\n",
    
        "\n",
        "We want some algorithm to do this for us!\n",
        "\n",
        "In order to achieve this we first need to choose a loss function for the problem at hand\n",
        "\n",
        "\n",
        "### Loss function\n",
    
    chadhat's avatar
    chadhat committed
        "In order to learn using an algorithm for learning we need to define a quantity which allows us to measure how far are the predictions of our network/setup are from the reality. This is done by choosing a so-called \"Loss function\" (as in the case for other machine learning algorithms). In other words this function measures how close are the predictions of our network to the supplied labels. Once we have this function we need an algorithm to update the weights of the network such that this loss decreases. As one can already imagine the choice of an appropriate loss function is very important to the success of the model. Fortunately, for classification and regression (which cover a large variety of probelms) these loss functions are well known. \n",
        "\n",
        "Generally **crossentropy** and **mean squared error** loss functions are used for classification and regression problems, respectively.\n",
    
        "\n",
        "### Gradient based learning\n",
    
    chadhat's avatar
    chadhat committed
        "As mentioned above, once we have decided upon a loss function we want to solve an **optimization problem** which minimizes this loss by updating the weights of the network. This is how the learning actually happens.\n",
        "\n",
        "The most popular optimization methods used in Neural Network training are some sort of **Gradient-descent** type methods, for e.g. gradient-descent, RMSprop, adam etc. \n",
        "**Gradient-descent** uses partial derivatives of the loss function with respect to the network weights and a learning rate to updates the weights such that the loss function decreases and hopefully after some iterations reaches its (Global) minimum.\n",
        "\n",
        "First, the loss function and its derivative are computed at the output node and this signal is propogated backwards, using chain rule, in the network to compute the partial derivatives. Hence, this method is called **Backpropagation**.\n",
        "\n",
        "Depending of\n",
    
        "\n",
        "\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "### Activation Functions\n",
        "\n",
    
        "In order to train the network we need to change Perceptron's **step** activation function as it does not allow training using the back-propagation algorithm among other drawbacks.\n",
        "\n",
        "Non-Linear functions such as:\n",
        "\n",
        "* ReLU (Rectified linear unit)\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\mathrm{max}(0,z)\n",
        "\\end{equation*}\n",
        "\n",
        "* Sigmoid\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\frac{1}{1+e^{-z}}\n",
        "\\end{equation*}\n",
        "\n",
        "* tanh\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\n",
        "\\end{equation*}\n",
        "\n",
        "\n",
        "are some of the most popular choices used as activation functions.\n",
        "\n",
        "Linear activations are **NOT** used because it can be mathematically shown that if linear activations are used then output is just a linear function of the input. So adding any number of hidden layers does not help to learn interesting functions.\n",
        "\n",
        "Non-linear activation functions allow the network to learn more complex representations."
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "\n",
    
        "plt.figure(figsize=(10, 4))\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "pts=np.arange(-20,20, 0.1)\n",
        "\n",
    
        "plt.subplot(1, 3, 1)\n",
    
    chadhat's avatar
    chadhat committed
        "# Sigmoid\n",
    
        "plt.plot(pts, 1/(1+np.exp(-pts))) ;\n",
        "\n",
        "plt.subplot(1, 3, 2)\n",
    
    chadhat's avatar
    chadhat committed
        "# tanh\n",
    
        "plt.plot(pts, np.tanh(pts*np.pi)) ;\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "# Rectified linear unit (ReLu)\n",
    
        "plt.subplot(1, 3, 3)\n",
    
    chadhat's avatar
    chadhat committed
        "pts_relu=[max(0,i) for i in pts];\n",
    
        "plt.plot(pts, pts_relu) ;"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Introduction to Keras"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "### What is Keras?\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "* It is a high level API to create and work with neural networks\n",
    
        "* Supports multiple backends such as TensorFlow from Google, Theano (Although Theano is dead now) and CNTK (Microsoft Cognitive Toolkit)\n",
        "* Very good for creating neural nets very quickly and hides away a lot of tedious work\n",
        "* Has been incorporated into official TensorFlow (which obviously only works with tensforflow) and as of TensorFlow 2.0 this will the main api to use TensorFlow (check reference)\n"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Say hello to keras\n",
        "\n",
        "from keras.models import Sequential\n",
        "from keras.layers import Dense, Activation\n",
        "\n",
        "# Creating a model\n",
        "model = Sequential()\n",
        "\n",
        "# Adding layers to this model\n",
        "# 1st Hidden layer\n",
    
        "# A Dense/fully-connected layer which takes as input a \n",
        "# feature array of shape (samples, num_features)\n",
        "# Here input_shape = (8,) means that the layer expects an input with num_features = 8 \n",
        "# and the sample size could be anything\n",
        "# Then we specify an activation function\n",
        "model.add(Dense(units=4, input_shape=(8,)))\n",
    
    chadhat's avatar
    chadhat committed
        "model.add(Activation(\"relu\"))\n",
        "\n",
    
        "# 2nd Hidden layer\n",
        "# This is also a fully-connected layer and we do not need to specify the\n",
        "# shape of the input anymore (We need to do that only for the first layer)\n",
    
        "# NOTE: Now\n",
        " we didn't add the activation seperately. Instead we just added it\n",
    
        "# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
        "model.add(Dense(units=4, activation=\"relu\"))\n",
        "\n",
        "          \n",
    
    chadhat's avatar
    chadhat committed
        "# The output layer\n",
        "model.add(Dense(units=1))\n",
        "model.add(Activation(\"sigmoid\"))\n",
        "\n",
        "model.summary()"
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### XOR using neural networks"
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "from sklearn.model_selection import train_test_split\n",
        "from keras.models import Sequential\n",
        "from keras.layers import Dense\n",
        "import numpy as np"
    
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
        "# Creating a network to solve the XOR problem\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "# Loading and plotting the data\n",
        "xor = pd.read_csv(\"xor.csv\")\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "# Using x and y coordinates as featues\n",
        "features = xor.iloc[:, :-1]\n",
        "# Convert boolean to integer values (True->1 and False->0)\n",
        "labels = xor.iloc[:, -1].astype(int)\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "colors = [[\"steelblue\", \"chocolate\"][i] for i in xor[\"label\"]]\n",
        "plt.figure(figsize=(5, 5))\n",
        "plt.xlim([-2, 2])\n",
        "plt.ylim([-2, 2])\n",
        "plt.title(\"Blue points are False\")\n",
    
        "\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\") ;"
    
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
    
        "# Building a Keras model\n",
    
        "def a_simple_NN():\n",
        "    \n",
        "    model = Sequential()\n",
    
        "    model.add(Dense(4, input_shape = (2,), activation = \"relu\"))\n",
        "\n",
        "    model.add(Dense(4, activation = \"relu\"))\n",
        "\n",
        "    model.add(Dense(1, activation = \"sigmoid\"))\n",
        "\n",
        "    model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
        "    \n",
        "    return model"
    
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
       "source": [
    
    chadhat's avatar
    chadhat committed
        "# Instantiating the model\n",
    
        "model = a_simple_NN()\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "# Splitting the dataset into training (70%) and validation sets (30%)\n",
    
        "X_train, X_test, y_train, y_test = train_test_split(\n",
        "    features, labels, test_size=0.3)\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "# Setting the number of passes through the entire training set\n",
        "num_epochs = 300\n",
    
        "\n",
        "# We can pass validation data while training\n",
        "model_run = model.fit(X_train, y_train, epochs=num_epochs,\n",
        "                      validation_data=(X_test, y_test))"
    
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
       "source": [
    
    chadhat's avatar
    chadhat committed
        "# Looking at the loss and accuracy on the training and validation sets during the training\n",
        "# This can be done by using Keras callback \"history\" which is applied by default\n",
    
        "history_model = model_run.history\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "print(\"The history has the following data: \", history_model.keys())\n",
        "\n",
        "# Plotting the training and validation accuracy during the training\n",
        "plt.plot(np.arange(1, num_epochs+1), history_model[\"acc\"], \"blue\") ;\n",
    
    chadhat's avatar
    chadhat committed
        "plt.plot(np.arange(1, num_epochs+1), history_model[\"val_acc\"], \"red\") ;"
    
       "cell_type": "markdown",
    
       "metadata": {},
       "source": [
    
    chadhat's avatar
    chadhat committed
        "**Here we dont't really see a big difference between the training and validation data because the function we are trying to fit is quiet simple and there is not too much noise. We will come back to these curves in a later example**"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In the example above we splitted our dataset into a 70-30 train-validation set. We know from previous chapters that to more robustly calculate accuracy we can use **K-fold crossvalidation**.\n",
    
        "This is even more important when we have small datasets and cannot afford to reserve a validation set!\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "One way to do the cross validation here would be to write our own function to do this. However, we also know that **SciKit learn** provides several handy functions to evaluate and tune the models. So the question is:\n",
    
    chadhat's avatar
    chadhat committed
        "Can we somehow use these **Scikit learn** functions or ones we wrote ourselves for **Scikit learn** models to evaluate and tune our Keras models?\n",
    
        "\n",
        "The Answer is **YES !**\n",
        "\n",
        "We show how to do this in the following section."
    
    chadhat's avatar
    chadhat committed
       ]
      },
    
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "## Using SciKit learn functions on Keras models\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "Keras offers 2 wrappers which allow its Sequential models to be used with SciKit learn. \n",
    
    chadhat's avatar
    chadhat committed
        "There are: **KerasClassifier** and **KerasRegressor**.\n",
    
        "\n",
        "For more information:\n",
        "https://keras.io/scikit-learn-api/\n",
        "\n",
        "**Now lets see how this works!**"
    
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
    
        "# We wrap the Keras model we created above with KerasClassifier\n",
    
    chadhat's avatar
    chadhat committed
        "from keras.wrappers.scikit_learn import KerasClassifier\n",
    
        "from sklearn.model_selection import cross_val_score\n",
    
    chadhat's avatar
    chadhat committed
        "# Wrapping Keras model\n",
        "# NOTE: We pass verbose=0 to suppress the model output\n",
        "num_epochs = 400\n",
        "model_scikit = KerasClassifier(\n",
        "    build_fn=a_simple_NN, **{\"epochs\": num_epochs, \"verbose\": 0})"
    
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
    
       "outputs": [],
    
       "source": [
    
    chadhat's avatar
    chadhat committed
        "# Let's reuse the function to visualize the decision boundary which we saw in chapter 2 with minimal change\n",
        "\n",
    
        "def list_flatten(list_of_list):\n",
        "    flattened_list = [i for j in list_of_list for i in j]\n",
        "    return flattened_list\n",
    
    chadhat's avatar
    chadhat committed
        "def plot_points(plt=plt, marker='o'):\n",
        "    colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
        "    plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker=marker);\n",
        "\n",
    
        "def train_and_plot_decision_surface(\n",
        "    name, classifier, features_2d, labels, preproc=None, plt=plt, marker='o', N=400\n",
        "):\n",
    
        "    features_2d = np.array(features_2d)\n",
        "    xmin, ymin = features_2d.min(axis=0)\n",
        "    xmax, ymax = features_2d.max(axis=0)\n",
    
        "    x = np.linspace(xmin, xmax, N)\n",
        "    y = np.linspace(ymin, ymax, N)\n",
        "    points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n",
    
        "    if preproc is not None:\n",
        "        points_for_classifier = preproc.fit_transform(points)\n",
        "        features_2d = preproc.fit_transform(features_2d)\n",
        "    else:\n",
        "        points_for_classifier = points\n",
        "\n",
        "    classifier.fit(features_2d, labels, verbose=0)\n",
        "    predicted = classifier.predict(features_2d)\n",
        "    \n",
        "    if name == \"Neural Net\":\n",
        "        predicted = list_flatten(predicted)\n",
        "    \n",
        "    \n",
        "    if preproc is not None:\n",
        "        name += \" (w/ preprocessing)\"\n",
        "    print(name + \":\\t\", sum(predicted == labels), \"/\", len(labels), \"correct\")\n",
        "    \n",
        "    if name == \"Neural Net\":\n",
        "        classes = np.array(list_flatten(classifier.predict(points_for_classifier)), dtype=bool)\n",
        "    else:\n",
        "        classes = np.array(classifier.predict(points_for_classifier), dtype=bool)\n",
        "    plt.plot(\n",
        "        points[~classes][:, 0],\n",
        "        points[~classes][:, 1],\n",
        "        \"o\",\n",
        "        color=\"steelblue\",\n",
        "        markersize=1,\n",
        "        alpha=0.01,\n",
        "    )\n",
        "    plt.plot(\n",
        "        points[classes][:, 0],\n",
        "        points[classes][:, 1],\n",
        "        \"o\",\n",
        "        color=\"chocolate\",\n",
        "        markersize=1,\n",
        "        alpha=0.04,\n",
        "    )"
    
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
       "source": [
    
        "_, ax = plt.subplots(figsize=(6, 6))\n",
    
        "train_and_plot_decision_surface(\"Neural Net\", model_scikit, features, labels, plt=ax)\n",
        "plot_points(plt=ax)"
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Applying K-fold cross-validation\n",
        "# Here we pass the whole dataset, i.e. features and labels, instead of splitting it.\n",
        "num_folds = 5\n",
        "cross_validation = cross_val_score(\n",
        "    model_scikit, features, labels, cv=num_folds, verbose=0)\n",
        "\n",
        "print(\"The acuracy on the \", num_folds, \" validation folds:\", cross_validation)\n",
        "print(\"The Average acuracy on the \", num_folds, \" validation folds:\", np.mean(cross_validation))"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### NOTE: The above code took quiet long even though we used only 5  CV folds and the neural network and data size are very small!"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Hyperparameter optimization"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "We know from chapter 6 that there are 2 types of parameters which need to be tuned for a machine learning model.\n",
        "* Normal model parameters which can be learned for e.g. by gradient-descent\n",
        "* Hyperparameters\n",
        "\n",
        "In the model which we created above we made some arbitrary choices like which optimizer we use, what is its learning rate, number of hidden units and so on ...\n",
        "\n",
        "Now that we have the keras model wrapped as a scikit model we can use the grid search functions we have seen in chapter 6."
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "from sklearn.model_selection import GridSearchCV"
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "HP_grid = dict(epochs=[300, 500, 1000])\n",
        "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
        "search.fit(features, labels)\n",
        "print(search.best_score_, search.best_params_)"
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "HP_grid = {'epochs' : [10, 15, 30], \n",
        "           'batch_size' : [10, 20, 30] }\n",
        "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
        "search.fit(features, labels)\n",
        "print(search.best_score_, search.best_params_)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "# A more general model for further Hyperparameter optimization\n",
        "from keras import optimizers\n",
        "\n",
        "def a_simple_NN(activation='relu', num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
        "\n",
        "    model = Sequential()\n",
        "\n",
        "    model.add(Dense(num_hidden_neurons[0],\n",
        "                    input_shape=(2,), activation=activation))\n",
        "\n",
        "    model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
        "\n",
        "    model.add(Dense(1, activation=\"sigmoid\"))\n",
        "\n",
        "    model.compile(loss=\"binary_crossentropy\", optimizer=optimizers.rmsprop(\n",
        "        lr=learning_rate), metrics=[\"accuracy\"])\n",
        "\n",
        "    return model"
       ]
      },
    
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    chadhat's avatar
    chadhat committed
        "### Exercise: \n",
        "* Look at the model above and choose a couple of hyperparameters to optimize. \n",
        "* What function from SciKit learn other than GridSearchCV can we use for hyperparameter optimization? Use it."
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "# Code here"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Exercise: Create a neural network to classify the 2d points example from chapter 2 learned \n",
        "(Optional: As you create the model read a bit on the different keras commands we have used)"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
       "source": [
    
        "circle = pd.read_csv(\"2d_points.csv\")\n",
        "# Using x and y coordinates as featues\n",
        "features = circle.iloc[:, :-1]\n",
        "# Convert boolean to integer values (True->1 and False->0)\n",
        "labels = circle.iloc[:, -1].astype(int)\n",
    
        "colors = [[\"steelblue\", \"chocolate\"][i] for i in circle[\"label\"]]\n",
    
        "plt.figure(figsize=(5, 5))\n",
        "plt.xlim([-2, 2])\n",
        "plt.ylim([-2, 2])\n",
        "\n",
    
        "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");\n"
    
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
    
        "# Insert Code here"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    chadhat's avatar
    chadhat committed
        "### The examples above are not the ideal use problems one should use neural networks for. They are too simple and can be easily solved by classical machine learning algorithms. Below we show examples which are the more common applications of Neural Networks."
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Handwritten Digits Classification\n",
    
        "### MNIST Dataset\n",
    
        "MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
    
        "\n",
        "\n",
    
        ">The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a >test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size->normalized and centered in a fixed-size image.\n",
        ">It is a good database for people who want to try learning techniques and pattern recognition methods on real-world >data while spending minimal efforts on preprocessing and formatting.\n",
        ">source: http://yann.lecun.com/exdb/mnist/\n",
    
        "The problem we want to solve using this dataset is: multi-class classification\n",
        "This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. "
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "# Loading the dataset in keras\n",
        "# Later you can explore and play with other datasets with come with Keras\n",
        "from keras.datasets import mnist\n",
    
        "# Loading the train and test data\n",
    
        "(X_train, y_train), (X_test, y_test) = mnist.load_data()"
    
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
       "source": [
        "# Looking at the dataset\n",
        "print(X_train.shape)"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
       "source": [
    
        "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "i=np.random.randint(0,X_train.shape[0])\n",
        "plt.imshow(X_train[i], cmap=\"gray_r\") ;\n",
        "print(\"This digit is: \" , y_train[i])"
    
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
        "# Look at the data values for a couple of images\n",
        "print(X_train[0].min(), X_train[1].max())"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "The data consists of values between 0-255 representing the **grayscale level**"
    
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
        "# The labels are the digit on the image\n",
        "print(y_train.shape)"