Skip to content
Snippets Groups Projects
neural_nets_intro.ipynb 527 KiB
Newer Older
  • Learn to ignore specific revisions
  • chadhat's avatar
    chadhat committed
    {
     "cells": [
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Introduction to Neural Networks\n",
        "\n",
        "## TO DO: Almost all the figues and schematics will be replaced or improved slowly\n",
        "\n",
        "<img src=\"./images/neuralnets/Colored_neural_network.svg\"/>\n",
        "source: https://en.wikipedia.org/wiki/Artificial_neural_network\n",
        "\n"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## History of Neural networks\n",
        "\n",
        "**TODO: Make it more complete and format properly**\n",
        "\n",
        "1943 - Threshold Logic\n",
        "\n",
        "1940s - Hebbian Learning\n",
        "\n",
        "1958 - Perceptron\n",
        "\n",
        "1975 - Backpropagation\n",
        "\n",
        "1980s - Neocognitron\n",
        "\n",
        "1982: Hopfield Network\n",
        "\n",
        "1986: Convolutional Neural Networks\n",
        "\n",
        "1997: Long-short term memory (LSTM) model\n",
        "\n",
    
        "2014: Gated Recurrent Units, Generative Adversarial Networks(Check)?"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Why the boom now?\n",
        "* Data\n",
        "* Data\n",
        "* Data\n",
        "* Availability of GPUs\n",
        "* Algorithmic developments which allow for efficient training and training for deeper networks\n",
        "* Much easier access than a decade ago"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Building blocks\n",
        "### Perceptron\n",
        "\n",
        "Smallest unit of a neural network is a **perceptron** like node.\n",
        "\n",
        "**What is a Perceptron?**\n",
        "\n",
        "It is a simple function which has multiple inputs and a single output.\n",
        "\n",
        "Step 1: Weighted sum of the inputs is calculated\n",
        "\n",
        "\\begin{equation*}\n",
        "weighted\\_sum = \\sum_{k=1}^{num\\_inputs} w_{i} x_{i}\n",
        "\\end{equation*}\n",
        "\n",
        "Step 2: The following activation function is applied\n",
        "\n",
        "$$\n",
        "f(weighted\\_sum) = \\left\\{\n",
        "        \\begin{array}{ll}\n",
    
        "            0 & \\quad weighted\\_sum < threshold \\\\\n",
        "            1 & \\quad weighted\\_sum \\geq threshold\n",
    
    chadhat's avatar
    chadhat committed
        "        \\end{array}\n",
        "    \\right.\n",
        "$$\n",
    
        "\n",
        "You can see that this is also a linear classifier as we introduced in script 02."
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 17,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "%matplotlib inline\n",
    
        "%config IPCompleter.greedy=True\n",
        "%config InlineBackend.figure_format = 'retina'"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 18,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/plain": [
           "1"
          ]
         },
    
         "execution_count": 18,
    
    chadhat's avatar
    chadhat committed
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "import numpy as np\n",
        "def perceptron(X, w, threshold=1):\n",
        "    # This function computes sum(w_i*x_i) and \n",
        "    # applies a perceptron activation\n",
        "    linear_sum = np.dot(X,w)\n",
        "    output=0\n",
        "    if linear_sum >= threshold:\n",
        "        output = 1\n",
        "        # print(\"The perceptron has peaked\")\n",
        "    return output\n",
        "X = [1,0]\n",
        "w = [1,1]\n",
        "perceptron(X,w)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "#### Boolean AND\n",
        "\n",
        "| x$_1$ | x$_2$ | output |\n",
        "| --- | --- | --- |\n",
        "| 0 | 0 | 0 |\n",
        "| 1 | 0 | 0 |\n",
        "| 0 | 1 | 0 |\n",
        "| 1 | 1 | 1 |"
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 158,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
    
          "Perceptron output for x1, x2 =  [0, 0]  is  0\n",
          "Perceptron output for x1, x2 =  [1, 0]  is  0\n",
          "Perceptron output for x1, x2 =  [0, 1]  is  0\n",
          "Perceptron output for x1, x2 =  [1, 1]  is  1\n"
    
    chadhat's avatar
    chadhat committed
         ]
        }
       ],
       "source": [
        "# Calculating Boolean AND using a perceptron\n",
        "import matplotlib.pyplot as plt\n",
    
        "threshold = 1.5\n",
    
    chadhat's avatar
    chadhat committed
        "w=[1,1]\n",
        "X=[[0,0],[1,0],[0,1],[1,1]]\n",
        "for i in X:\n",
    
        "    print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": []
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:"
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 20,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [
        {
         "data": {
    
          "image/png": "\n",
    
    chadhat's avatar
    chadhat committed
          "text/plain": [
    
           "<matplotlib.figure.Figure at 0x7fe999f134e0>"
    
    chadhat's avatar
    chadhat committed
          ]
         },
         "metadata": {
    
          "image/png": {
           "height": 252,
           "width": 388
          },
    
    chadhat's avatar
    chadhat committed
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "# Plotting the decision boundary\n",
        "plt.xlim(-1,2)\n",
        "plt.ylim(-1,2)\n",
        "for i in X:\n",
        "    plt.plot(i,\"o\",color=\"b\");\n",
        "# Plotting the decision boundary\n",
        "# that is a line given by w_1*x_1+w_2*x_2-threshold=0\n",
        "plt.plot(np.arange(-3,4), 1.5-np.arange(-3,4), \"--\", color=\"black\");"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "**Exercise :Can you compute a Boolean \"OR\" using a perceptron?**\n",
        "\n",
        "Hint: copy the code from the \"AND\" example and edit the weights and/or threshold"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "#### Boolean OR\n",
        "\n",
        "| x$_1$ | x$_2$ | output |\n",
        "| --- | --- | --- |\n",
        "| 0 | 0 | 0 |\n",
        "| 1 | 0 | 1 |\n",
        "| 0 | 1 | 1 |\n",
        "| 1 | 1 | 1 |"
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 21,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "# Calculating Boolean OR using a perceptron\n",
        "# Edit the code below"
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 157,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
    
          "Perceptron output for x1, x2 =  [0, 0]  is  0\n",
          "Perceptron output for x1, x2 =  [1, 0]  is  1\n",
          "Perceptron output for x1, x2 =  [0, 1]  is  1\n",
          "Perceptron output for x1, x2 =  [1, 1]  is  1\n"
    
    chadhat's avatar
    chadhat committed
         ]
        },
        {
         "data": {
    
          "image/png": "\n",
    
    chadhat's avatar
    chadhat committed
          "text/plain": [
    
           "<matplotlib.figure.Figure at 0x7fe8e711fc18>"
    
    chadhat's avatar
    chadhat committed
          ]
         },
         "metadata": {
    
          "image/png": {
           "height": 252,
           "width": 388
          },
    
    chadhat's avatar
    chadhat committed
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "# Solution\n",
        "# Calculating Boolean OR using a perceptron\n",
        "import matplotlib.pyplot as plt\n",
        "threshold=0.6\n",
        "w=[1,1]\n",
        "X=[[0,0],[1,0],[0,1],[1,1]]\n",
        "for i in X:\n",
    
        "    print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))\n",
    
    chadhat's avatar
    chadhat committed
        "# Plotting the decision boundary\n",
        "plt.xlim(-1,2)\n",
        "plt.ylim(-1,2)\n",
        "for i in X:\n",
        "    plt.plot(i,\"o\",color=\"b\");\n",
        "# Plotting the decision boundary\n",
        "# that is a line given by w_1*x_1+w_2*x_2-threshold=0\n",
        "plt.plot(np.arange(-3,4), threshold-np.arange(-3,4), \"--\", color=\"black\");"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "**Optional exercise: Create a NAND gate with perceptrons**"
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 23,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "# Calculating Boolean NAND using a perceptron\n",
        "\n",
        "\n",
        "\n"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In fact a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n",
        "However, it cannot compute some other boolean functions such as \"XOR\"\n",
        "\n",
        "WHAT CAN WE DO?\n",
        "Hint: What is the significance of the NAND gate we created above\n",
        "\n",
        "We said a single perceptron can't compute these functions. We didn't say that about **multiple Perceptrons**"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "**XOR function**\n",
        "\n",
        "**TO DO: INSERT IMAGE HERE!!!!!!!!!!!!!!**"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "### Google Playground\n",
        "\n",
        "UWE: move up before discussing gradient stuff etc\n",
        "\n",
        "https://playground.tensorflow.org/\n",
        "\n",
        "<img src=\"./images/neuralnets/google_playground.png\"/>"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Learning\n",
        "\n",
        "Now we know that we can compute complex functions if we stack together a number of perceptrons.\n",
        "\n",
        "However, we can DO NOT want to set the weights and thresholds by hand as we did in the examples above.\n",
        "\n",
        "We want some algorithm to do this for us!\n",
        "\n",
        "In order to achieve this we first need to choose a loss function for the problem at hand\n",
        "\n",
        "\n",
        "### Loss function\n",
        "As in the case of other machine learning algorithms we need to define a so-called \"Loss function\". In simple words this function measures how close are the predictions of our network to the supplied labels. Once we have this function we need an algorithm to update the weights of the network such that this loss decreases. As one can already imagine the choice of an appropriate loss function is very important to the success of the trained model. Fortunately, for classification and regression (which comprise of a large range of probelms) these loss functions are well known. Generally **crossentropy** and **mean squared error** loss functions are chosen for classification and regression problems, respectively.\n",
        "\n",
        "### Gradient based learning\n",
        "Once we have a loss function we want to solve an **optimization problem** which minimizes this loss by updating the weights of the network and this is how the learning actually happens.\n",
        "\n",
        "One of the most popular optimization method used in machine learning is **Gradient-descent**\n",
        "\n",
        "INSERT MORE EXPLAINATIONS HERE\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "### Activation Functions\n",
        "\n",
    
        "In order to train the network we need to change Perceptron's **step** activation function as it does not allow training using the back-propagation algorithm among other drawbacks.\n",
        "\n",
        "Non-Linear functions such as:\n",
        "\n",
        "* ReLU (Rectified linear unit)\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\mathrm{max}(0,z)\n",
        "\\end{equation*}\n",
        "\n",
        "* Sigmoid\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\frac{1}{1+e^{-z}}\n",
        "\\end{equation*}\n",
        "\n",
        "* tanh\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\n",
        "\\end{equation*}\n",
        "\n",
        "\n",
        "are some of the most popular choices used as activation functions.\n",
        "\n",
        "Linear activations are **NOT** used because it can be mathematically shown that if linear activations are used then output is just a linear function of the input. So adding any number of hidden layers does not help to learn interesting functions.\n",
        "\n",
        "Non-linear activation functions allow the network to learn more complex representations."
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 135,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [
        {
         "data": {
    
          "image/png": "\n",
    
    chadhat's avatar
    chadhat committed
          "text/plain": [
    
           "<matplotlib.figure.Figure at 0x7fe8eb5e6978>"
    
    chadhat's avatar
    chadhat committed
          ]
         },
         "metadata": {
    
          "image/png": {
           "height": 250,
           "width": 597
          },
    
    chadhat's avatar
    chadhat committed
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "\n",
    
        "plt.figure(figsize=(10, 4))\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "pts=np.arange(-20,20, 0.1)\n",
        "\n",
    
        "plt.subplot(1, 3, 1)\n",
    
    chadhat's avatar
    chadhat committed
        "# Sigmoid\n",
    
        "plt.plot(pts, 1/(1+np.exp(-pts))) ;\n",
        "\n",
        "plt.subplot(1, 3, 2)\n",
    
    chadhat's avatar
    chadhat committed
        "# tanh\n",
    
        "plt.plot(pts, np.tanh(pts*np.pi)) ;\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "# Rectified linear unit (ReLu)\n",
    
        "plt.subplot(1, 3, 3)\n",
    
    chadhat's avatar
    chadhat committed
        "pts_relu=[max(0,i) for i in pts];\n",
    
        "plt.plot(pts, pts_relu) ;"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "Suggestion Uwe:\n",
        "\n",
        "1. more layers might improve power of single perctptron.\n",
        "\n",
        "2. regrettably math show that just \"stacking\" perceptrons only adds little improvements\n",
        "\n",
        "3. way around: look at nature how neuron works and introduce non linear activation functions.\n",
        "\n",
        "4. theoretical background: universal approximation theorem.\n",
        "\n",
        "\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "### Multi-layer preceptron neural network\n",
        "Universal function theorem\n",
        "\n",
    
        "epochs\n"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Introduction to Keras"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "What is **Keras**?\n",
        "\n",
        "* It is a high level API to create and work with neural networks\n",
    
        "* Supports multiple backends such as TensorFlow from Google, Theano (Although Theano is dead now) and CNTK (Microsoft Cognitive Toolkit)\n",
        "* Very good for creating neural nets very quickly and hides away a lot of tedious work\n",
        "* Has been incorporated into official TensorFlow (which obviously only works with tensforflow) and as of TensorFlow 2.0 this will the main api to use TensorFlow (check reference)\n"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": 35,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [
        {
    
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "_________________________________________________________________\n",
          "Layer (type)                 Output Shape              Param #   \n",
          "=================================================================\n",
          "dense_9 (Dense)              (None, 4)                 36        \n",
          "_________________________________________________________________\n",
          "activation_7 (Activation)    (None, 4)                 0         \n",
          "_________________________________________________________________\n",
          "dense_10 (Dense)             (None, 4)                 20        \n",
          "_________________________________________________________________\n",
          "dense_11 (Dense)             (None, 1)                 5         \n",
          "_________________________________________________________________\n",
          "activation_8 (Activation)    (None, 1)                 0         \n",
          "=================================================================\n",
          "Total params: 61\n",
          "Trainable params: 61\n",
          "Non-trainable params: 0\n",
          "_________________________________________________________________\n"
    
    chadhat's avatar
    chadhat committed
         ]
        }
       ],
       "source": [
        "# Say hello to keras\n",
        "\n",
        "from keras.models import Sequential\n",
        "from keras.layers import Dense, Activation\n",
        "\n",
        "# Creating a model\n",
        "model = Sequential()\n",
        "\n",
        "# Adding layers to this model\n",
        "# 1st Hidden layer\n",
    
        "# A Dense/fully-connected layer which takes as input a \n",
        "# feature array of shape (samples, num_features)\n",
        "# Here input_shape = (8,) means that the layer expects an input with num_features = 8 \n",
        "# and the sample size could be anything\n",
        "# Then we specify an activation function\n",
        "model.add(Dense(units=4, input_shape=(8,)))\n",
    
    chadhat's avatar
    chadhat committed
        "model.add(Activation(\"relu\"))\n",
        "\n",
    
        "# 2nd Hidden layer\n",
        "# This is also a fully-connected layer and we do not need to specify the\n",
        "# shape of the input anymore (We need to do that only for the first layer)\n",
        "# NOTE: Now we didn't add the activation seperately. Instead we just added it\n",
        "# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
        "model.add(Dense(units=4, activation=\"relu\"))\n",
        "\n",
        "          \n",
    
    chadhat's avatar
    chadhat committed
        "# The output layer\n",
        "model.add(Dense(units=1))\n",
        "model.add(Activation(\"sigmoid\"))\n",
        "\n",
        "model.summary()"
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
    
        "# Fitting the model "
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    chadhat's avatar
    chadhat committed
        "**TO DO: Move the MNIST example after the previous dataset examples**"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "### MNIST Dataset\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "\n",
        ">The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a >test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size->normalized and centered in a fixed-size image.\n",
        ">It is a good database for people who want to try learning techniques and pattern recognition methods on real-world >data while spending minimal efforts on preprocessing and formatting.\n",
        ">source: http://yann.lecun.com/exdb/mnist/\n",
        "\n",
        "The problem we want to solve using this dataset is: multi-class classification\n",
        "This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. "
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 184,
       "metadata": {},
       "outputs": [],
       "source": [
        "# Loading the dataset in keras\n",
        "# Later you can explore and play with other datasets with come with Keras\n",
        "from keras.datasets import mnist\n",
        "\n",
        "# Loading the train and test data\n",
        "\n",
        "(X_train, y_train), (X_test, y_test) = mnist.load_data()"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 185,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "(60000, 28, 28)\n"
         ]
        }
       ],
       "source": [
        "# Looking at the dataset\n",
        "print(X_train.shape)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 186,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "This digit is:  8\n"
         ]
        },
        {
         "data": {
          "image/png": "\n",
          "text/plain": [
           "<matplotlib.figure.Figure at 0x7fe8e68579e8>"
          ]
         },
         "metadata": {
          "image/png": {
           "height": 250,
           "width": 253
          },
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "i=np.random.randint(0,X_train.shape[0])\n",
        "plt.imshow(X_train[i], cmap=\"gray_r\") ;\n",
        "print(\"This digit is: \" , y_train[i])"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 187,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   3  18  18  18 126 136\n",
          "  175  26 166 255 247 127   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0  30  36  94 154 170 253 253 253 253 253\n",
          "  225 172 253 242 195  64   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0  49 238 253 253 253 253 253 253 253 253 251\n",
          "   93  82  82  56  39   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0  18 219 253 253 253 253 253 198 182 247 241\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0  80 156 107 253 253 205  11   0  43 154\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0  14   1 154 253  90   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0 139 253 190   2   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0  11 190 253  70   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0  35 241 225 160 108   1\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0  81 240 253 253 119\n",
          "   25   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0  45 186 253 253\n",
          "  150  27   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  16  93 252\n",
          "  253 187   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 249\n",
          "  253 249  64   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0  46 130 183 253\n",
          "  253 207   2   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0  39 148 229 253 253 253\n",
          "  250 182   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0  24 114 221 253 253 253 253 201\n",
          "   78   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0  23  66 213 253 253 253 253 198  81   2\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0  18 171 219 253 253 253 253 195  80   9   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0  55 172 226 253 253 253 253 244 133  11   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0 136 253 253 253 212 135 132  16   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]\n",
          " [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0\n",
          "    0   0   0   0   0   0   0   0   0   0]]\n"
         ]
        }
       ],
       "source": [
        "# Look at the data values for a couple of images\n",
        "print(X_train[0])"
    
    chadhat's avatar
    chadhat committed
       ]
      },
    
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "The data consists of values between 0-255 representing the **grayscale level**"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 188,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "(60000,)\n"
         ]
        }
       ],
       "source": [
        "# The labels are the digit on the image\n",
        "print(y_train.shape)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 190,
       "metadata": {},
       "outputs": [],
       "source": [
        "# Scaling the data\n",
        "# It is important to normalize the input data to (0-1) before providing it to a neural net\n",
        "# We could use the previously introduced function from SciKit learn. However, here it is sufficient to\n",
        "# just divide the input data by 255\n",
        "X_train_norm = X_train/255.\n",
        "X_test_norm = X_test/255.\n",
    
        "# Also we need to reshape the input data such that each sample is a vector and not a 2D matrix\n",
        "X_train_prep = X_train_norm.reshape(X_train_norm.shape[0],28*28)\n",
        "X_test_prep = X_test_norm.reshape(X_test_norm.shape[0],28*28)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "**IMPORTANT: One-Hot encoding**\n",
        "\n",
        "**TODO: Better frame the explaination**\n",
        "In such problems the labels are provided as something called **One-hot encodings**. What this does is to convert a categorical label to a vector.\n",
        "\n",
        "For the MNIST problem where we have **10 categories** one-hot encoding will create a vector of length 10 for each of the labels. All the entries of this vector will be zero **except** for the index which is equal to the integer value of the label.\n",
        "\n",
        "For example:\n",
        "if label is 4. The one-hot vector will look like **[0 0 0 0 1 0 0 0 0 0]**\n",
        "\n",
        "Fortunately, we don't have to code this ourselves because Keras has a built-in function for this."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 191,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "(60000, 10)\n"
         ]
        }
       ],
       "source": [
        "from keras.utils.np_utils import to_categorical\n",
        "\n",
        "y_train_onehot = to_categorical(y_train, num_classes=10)\n",
        "y_test_onehot = to_categorical(y_test, num_classes=10)\n",
        "\n",
        "print(y_train_onehot.shape)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 194,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Epoch 1/20\n",
          "60000/60000 [==============================] - 2s 34us/step - loss: 0.5888 - acc: 0.8434\n",
          "Epoch 2/20\n",
          "60000/60000 [==============================] - 1s 20us/step - loss: 0.2569 - acc: 0.9267\n",
          "Epoch 3/20\n",
          "60000/60000 [==============================] - 1s 16us/step - loss: 0.2024 - acc: 0.9416\n",
          "Epoch 4/20\n",
          "60000/60000 [==============================] - 1s 17us/step - loss: 0.1706 - acc: 0.9497\n",
          "Epoch 5/20\n",
          "60000/60000 [==============================] - 1s 23us/step - loss: 0.1475 - acc: 0.9563\n",
          "Epoch 6/20\n",
          "60000/60000 [==============================] - 1s 20us/step - loss: 0.1290 - acc: 0.9627\n",
          "Epoch 7/20\n",
          "60000/60000 [==============================] - 1s 23us/step - loss: 0.1162 - acc: 0.9651\n",
          "Epoch 8/20\n",
          "60000/60000 [==============================] - 1s 19us/step - loss: 0.1035 - acc: 0.9691\n",
          "Epoch 9/20\n",
          "60000/60000 [==============================] - 2s 28us/step - loss: 0.0939 - acc: 0.9716\n",
          "Epoch 10/20\n",
          "60000/60000 [==============================] - 1s 22us/step - loss: 0.0848 - acc: 0.9743\n",
          "Epoch 11/20\n",
          "60000/60000 [==============================] - 1s 25us/step - loss: 0.0777 - acc: 0.9763\n",
          "Epoch 12/20\n",
          "60000/60000 [==============================] - 1s 20us/step - loss: 0.0720 - acc: 0.9780\n",
          "Epoch 13/20\n",
          "60000/60000 [==============================] - 1s 22us/step - loss: 0.0655 - acc: 0.9808\n",
          "Epoch 14/20\n",
          "60000/60000 [==============================] - 2s 30us/step - loss: 0.0610 - acc: 0.9817\n",
          "Epoch 15/20\n",
          "60000/60000 [==============================] - 1s 16us/step - loss: 0.0563 - acc: 0.9832\n",
          "Epoch 16/20\n",
          "60000/60000 [==============================] - 1s 20us/step - loss: 0.0527 - acc: 0.9842\n",
          "Epoch 17/20\n",
          "60000/60000 [==============================] - 1s 21us/step - loss: 0.0478 - acc: 0.9854\n",
          "Epoch 18/20\n",
          "60000/60000 [==============================] - 1s 15us/step - loss: 0.0453 - acc: 0.9864\n",
          "Epoch 19/20\n",
          "60000/60000 [==============================] - 1s 18us/step - loss: 0.0419 - acc: 0.9874\n",
          "Epoch 20/20\n",
          "60000/60000 [==============================] - 1s 20us/step - loss: 0.0387 - acc: 0.9885\n"
         ]
        },
        {
         "data": {
          "text/plain": [
           "<keras.callbacks.History at 0x7fe8e7465438>"
          ]
         },
         "execution_count": 194,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "# Building the keras model\n",
        "from keras.models import Sequential\n",
        "from keras.layers import Dense\n",
        "\n",
        "model = Sequential()\n",
        "\n",
        "model.add(Dense(64,input_shape=(28*28,), activation=\"relu\"))\n",
        "\n",
        "model.add(Dense(64, activation = \"relu\"))\n",
        "\n",
        "model.add(Dense(10, activation = \"softmax\"))\n",
        "\n",
        "model.compile(loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
        "\n",
        "model_history = model.fit(X_train_prep, y_train_cat, epochs=20, batch_size=512);"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 196,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "10000/10000 [==============================] - 1s 85us/step\n",
          "The [loss, accuracy] are:  [0.08737125840586377, 0.974]\n"
         ]
        }
       ],
       "source": [
        "# Evaluating the model on test dataset\n",
        "print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Work in Progress\n",
        "\n",
        "## Network results on dataset used in previous notebooks"
    
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": 9,
       "metadata": {},
       "outputs": [],
       "source": [
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "from sklearn.model_selection import train_test_split\n",
        "from keras.models import Sequential\n",
        "from keras.layers import Dense\n",
        "import numpy as np"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 10,
    
       "metadata": {},
       "outputs": [
        {
         "data": {
    
    chadhat's avatar
    chadhat committed
          "image/png": "\n",
    
          "text/plain": [
    
    chadhat's avatar
    chadhat committed
           "<Figure size 360x360 with 1 Axes>"
    
          ]
         },
         "metadata": {
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "# Creating a network to solve the XOR problem\n",
        "# Loading and plotting the data\n",
        "xor = pd.read_csv(\"xor.csv\")\n",
        "xv = xor[\"x\"]\n",
        "yv = xor[\"y\"]\n",
        "\n",
        "colors = [[\"steelblue\", \"chocolate\"][i] for i in xor[\"label\"]]\n",
        "plt.figure(figsize=(5, 5))\n",
        "plt.xlim([-2, 2])\n",
        "plt.ylim([-2, 2])\n",
        "plt.title(\"Blue points are False\")\n",
        "\n",
        "\n",
        "plt.scatter(xv, yv, color=colors, marker=\"o\");"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": 16,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "# Using x and y coordinates as featues\n",
        "features = xor.iloc[:, :-1]\n",
        "# Convert boolean to integer values (True->1 and False->0)\n",
        "labels = xor.iloc[:, -1].astype(int)\n",
        "\n",
        "# Building a Keras model\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "def a_simple_NN():\n",
        "    \n",
        "    model = Sequential()\n",
    
    chadhat's avatar
    chadhat committed
        "    model.add(Dense(4, input_shape = (2,), activation = \"relu\"))\n",
    
    chadhat's avatar
    chadhat committed
        "    model.add(Dense(4, activation = \"relu\"))\n",
    
    chadhat's avatar
    chadhat committed
        "    model.add(Dense(1, activation = \"sigmoid\"))\n",