Skip to content
Snippets Groups Projects
08_a-neural_networks.ipynb 23.8 KiB
Newer Older
  • Learn to ignore specific revisions
  • chadhat's avatar
    chadhat committed
    {
     "cells": [
      {
    
    chadhat's avatar
    chadhat committed
       "cell_type": "code",
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
        "from numpy.random import seed\n",
    
        "\n",
        "import os, sys\n",
        "\n",
        "if sys.platform == \"win32\":\n",
        "    os.add_dll_directory(os.path.dirname(sys.executable))\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "seed(42)\n",
    
    chadhat's avatar
    chadhat committed
        "import tensorflow as tf\n",
    
    chadhat's avatar
    chadhat committed
        "tf.random.set_seed(42)\n",
    
    chadhat's avatar
    chadhat committed
        "import matplotlib as mpl\n",
    
        "import matplotlib.pyplot as plt\n",
    
    chadhat's avatar
    chadhat committed
        "import seaborn as sns\n",
    
    chadhat's avatar
    chadhat committed
        "sns.set(style=\"darkgrid\")\n",
    
        "mpl.rcParams[\"lines.linewidth\"] = 3\n",
    
    chadhat's avatar
    chadhat committed
        "%matplotlib inline\n",
        "%config InlineBackend.figure_format = 'retina'\n",
        "%config IPCompleter.greedy=True\n",
        "import warnings\n",
    
        "\n",
        "warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
        "from IPython.core.display import HTML\n",
        "\n",
        "HTML(open(\"custom.html\", \"r\").read())"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "# Chapter 8a: Introduction to Neural Networks\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "\n",
        "\n",
        "<img src=\"./images/3042en.jpg\" title=\"made at imgflip.com\" width=35%/>\n"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    chadhat's avatar
    chadhat committed
        "## Brief history of neural networks\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "|  |  |\n",
        "| ----------- | ----------- |\n",
        "| 1943 | Threshold Logic |\n",
        "| 1940s | Hebbian Learning |\n",
        "| 1958 | Perceptron |\n",
        "| 1980s | Neocognitron |\n",
        "| 1982 | Hopfield Network |\n",
        "| 1989 | Convolutional neural network (CNN) kernels trained via backpropagation |\n",
        "| 1997 | Long-short term memory (LSTM) model |\n",
        "| 1998 | LeNet-5 |\n",
        "| 2014 | Gated Recurrent Units (GRU), Generative Adversarial Networks (GAN) |\n",
    
        "| 2015 | ResNet |\n",
        "| 2017 | Transformer model is proposed |"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Why the boom now?\n",
        "* Data\n",
        "* Data\n",
        "* Data\n",
    
        "* Availability of Graphics Processing Units (GPUs)\n",
    
    chadhat's avatar
    chadhat committed
        "* Algorithmic developments which allow for efficient training of networks networks and making them deeper\n",
    
    chadhat's avatar
    chadhat committed
        "* Development of high-level libraries/APIs have made the field much more accessible than it was a decade ago"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Feed-Forward neural network\n",
        "<center>\n",
        "<figure>\n",
        "<img src=\"./images/neuralnets/neural_net_ex.svg\" width=\"700\"/>\n",
    
        "<figcaption>A <b>three layer</b> densely connected Neural Network (By convention the input layer is not counted).</figcaption>\n",
    
    chadhat's avatar
    chadhat committed
        "</figure>\n",
        "</center>"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Building blocks\n",
        "### Perceptron\n",
        "\n",
        "The smallest unit of a neural network is a **perceptron** like node.\n",
        "\n",
        "**What is a Perceptron?**\n",
        "\n",
        "It is a simple function which can have multiple inputs and has a single output.\n",
        "\n",
        "<center>\n",
        "<figure>\n",
        "<img src=\"./images/neuralnets/perceptron_ex.svg\" width=\"400\"/>\n",
    
        "<figcaption>A simple perceptron with <b>three inputs</b> and <b>one output</b>.</figcaption>\n",
    
    chadhat's avatar
    chadhat committed
        "</figure>\n",
        "</center>\n",
        "\n",
        "\n",
        "It works as follows: \n",
        "\n",
        "Step 1: A **weighted sum** of the inputs is calculated\n",
        "\n",
        "\\begin{equation*}\n",
        "weighted\\_sum = w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3} + ...\n",
        "\\end{equation*}\n",
        "\n",
        "Step 2: A **step** activation function is applied\n",
        "\n",
        "$$\n",
        "f = \\left\\{\n",
        "        \\begin{array}{ll}\n",
        "            0 & \\quad weighted\\_sum < threshold \\\\\n",
        "            1 & \\quad weighted\\_sum \\geq threshold\n",
        "        \\end{array}\n",
        "    \\right.\n",
        "$$\n",
        "\n",
        "You can see that this is also a linear classifier as the ones we introduced in script 02."
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "import matplotlib.pyplot as plt\n",
    
        "import numpy as np\n",
        "import seaborn as sns"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Plotting the step function\n",
    
        "x = np.arange(-2, 2.1, 0.01)\n",
    
    chadhat's avatar
    chadhat committed
        "y = np.zeros(len(x))\n",
    
        "threshold = 0.0\n",
        "y[x > threshold] = 1.0\n",
    
    chadhat's avatar
    chadhat committed
        "step_plot = sns.lineplot(x=x, y=y).set_title(\"Step function\")\n",
    
        "plt.xlabel(\"weighted_sum\")\n",
        "plt.ylabel(\"f(weighted_sum)\");"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "def perceptron(X, w, threshold=1):\n",
        "    # This function computes sum(w_i*x_i) and\n",
        "    # applies a perceptron activation\n",
        "    linear_sum = np.dot(np.asarray(X).T, w)\n",
        "    output = np.zeros(len(linear_sum), dtype=np.int8)\n",
        "    output[linear_sum >= threshold] = 1\n",
        "    return output"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Boolean AND\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "| $x_1$ | $x_2$ | output |\n",
    
    chadhat's avatar
    chadhat committed
        "| --- | --- | --- |\n",
        "| 0 | 0 | 0 |\n",
        "| 1 | 0 | 0 |\n",
        "| 0 | 1 | 0 |\n",
        "| 1 | 1 | 1 |"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Calculating Boolean AND using a perceptron\n",
        "threshold = 1.5\n",
        "# (w1, w2)\n",
        "w = [1, 1]\n",
        "# (x1, x2) pairs\n",
        "x1 = [0, 1, 0, 1]\n",
    
    chadhat's avatar
    chadhat committed
        "x2 = [0, 0, 1, 1]\n",
        "# Calling the perceptron function\n",
        "output = perceptron([x1, x2], w, threshold)\n",
        "for i in range(len(output)):\n",
    
        "    print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i], \" is \", output[i])"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "def perceptron_DB(x1, x2, w, threshold):\n",
        "    # Plotting the decision boundary of the perceptron\n",
        "    plt.scatter(x1, x2, color=\"black\")\n",
    
        "    plt.xlim(-1, 2)\n",
        "    plt.ylim(-1, 2)\n",
    
    chadhat's avatar
    chadhat committed
        "    # The decision boundary is a line given by\n",
        "    # w_1*x_1+w_2*x_2-threshold=0\n",
        "    x1 = np.arange(-3, 4)\n",
    
        "    x2 = (threshold - x1 * w[0]) / w[1]\n",
    
    chadhat's avatar
    chadhat committed
        "    sns.lineplot(x=x1, y=x2, **{\"color\": \"black\"})\n",
    
    chadhat's avatar
    chadhat committed
        "    plt.xlabel(\"x$_1$\", fontsize=16)\n",
        "    plt.ylabel(\"x$_2$\", fontsize=16)\n",
        "    # Coloring the regions\n",
        "    pts_tmp = np.arange(-2, 2.1, 0.02)\n",
        "    points = np.array(np.meshgrid(pts_tmp, pts_tmp)).T.reshape(-1, 2)\n",
        "    outputs = perceptron(points.T, w, threshold)\n",
    
        "    plt.plot(\n",
        "        points[:, 0][outputs == 0],\n",
        "        points[:, 1][outputs == 0],\n",
        "        \"o\",\n",
        "        color=\"steelblue\",\n",
        "        markersize=1,\n",
        "        alpha=0.04,\n",
        "    )\n",
        "    plt.plot(\n",
        "        points[:, 0][outputs == 1],\n",
        "        points[:, 1][outputs == 1],\n",
        "        \"o\",\n",
        "        color=\"chocolate\",\n",
        "        markersize=1,\n",
        "        alpha=0.04,\n",
        "    )\n",
    
    chadhat's avatar
    chadhat committed
        "    plt.title(\"Blue color = 0 and Chocolate = 1\")"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Plotting the perceptron decision boundary\n",
        "perceptron_DB(x1, x2, w, threshold)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "## Exercise section\n",
        "1. Compute a Boolean \"OR\" using a perceptron\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "Hint: copy the code from the \"AND\" example and edit the weights and/or threshold"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Boolean OR\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "| $x_1$ | $x_2$ | output |\n",
    
    chadhat's avatar
    chadhat committed
        "| --- | --- | --- |\n",
        "| 0 | 0 | 0 |\n",
        "| 1 | 0 | 1 |\n",
        "| 0 | 1 | 1 |\n",
        "| 1 | 1 | 1 |"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "# Calculating Boolean OR using a perceptron\n",
        "# Enter code here"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {
        "tags": [
         "solution"
        ]
       },
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Solution\n",
        "# Calculating Boolean OR using a perceptron\n",
    
        "threshold = 0.6\n",
    
    chadhat's avatar
    chadhat committed
        "# (w1, w2)\n",
    
        "w = [1, 1]\n",
    
    chadhat's avatar
    chadhat committed
        "# (x1, x2) pairs\n",
        "x1 = [0, 1, 0, 1]\n",
        "x2 = [0, 0, 1, 1]\n",
        "output = perceptron([x1, x2], w, threshold)\n",
        "for i in range(len(output)):\n",
    
        "    print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i], \" is \", output[i])\n",
    
    chadhat's avatar
    chadhat committed
        "perceptron_DB(x1, x2, w, threshold)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "2. Create a NAND gate using a perceptron\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "Boolean NAND\n",
        "\n",
    
    chadhat's avatar
    chadhat committed
        "| $x_1$ | $x_2$ | output |\n",
    
    chadhat's avatar
    chadhat committed
        "| --- | --- | --- |\n",
        "| 0 | 0 | 1 |\n",
        "| 1 | 0 | 1 |\n",
        "| 0 | 1 | 1 |\n",
        "| 1 | 1 | 0 |"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
       "outputs": [],
       "source": [
        "# Calculating Boolean NAND using a perceptron\n",
        "# Enter code here"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {
        "tags": [
         "solution"
        ]
       },
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "# Solution\n",
        "# Calculating Boolean NAND using a perceptron\n",
        "import matplotlib.pyplot as plt\n",
    
        "\n",
        "threshold = -1.5\n",
    
    chadhat's avatar
    chadhat committed
        "# (w1, w2)\n",
    
        "w = [-1, -1]\n",
    
    chadhat's avatar
    chadhat committed
        "# (x1, x2) pairs\n",
        "x1 = [0, 1, 0, 1]\n",
        "x2 = [0, 0, 1, 1]\n",
        "output = perceptron([x1, x2], w, threshold)\n",
        "for i in range(len(output)):\n",
    
        "    print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i], \" is \", output[i])\n",
    
    chadhat's avatar
    chadhat committed
        "perceptron_DB(x1, x2, w, threshold)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In fact, a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n",
        "\n",
        "However, it cannot compute some other boolean functions such as \"XOR\".\n",
        "\n",
        "**WHAT CAN WE DO?**\n",
        "\n",
        "\n",
    
        "**Hint:** Think about what is the significance of the NAND gate we have created above?"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "## Multi-layer perceptrons\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "\n",
    
        "**Answer:** We said a single perceptron can't compute a \"XOR\" function. We didn't say that about **multiple Perceptrons** put together.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "The normal densely connected neural network is sometimes also called \"Multi-layer\" perceptron.\n",
        "\n",
        "**XOR function using multiple perceptrons**\n",
        "\n",
        "<center>\n",
        "<figure>\n",
        "<img src=\"./images/neuralnets/perceptron_XOR.svg\" width=\"400\"/>\n",
        "<figcaption>Multiple perceptrons connected together to output a XOR function.</figcaption>\n",
        "</figure>\n",
        "</center>"
       ]
      },
      {
       "cell_type": "markdown",
    
       "metadata": {
        "tags": []
       },
    
    chadhat's avatar
    chadhat committed
       "source": [
        "## Learning\n",
        "\n",
    
        "We know that we can compute complicated functions by combining a number of perceptrons."
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "<div class=\"alert alert-block alert-info\">\n",
        "<i class=\"fa fa-info-circle\"></i>\n",
        "<a href=https://en.wikipedia.org/wiki/Universal_approximation_theorem>Universal Approximation Theorem:</a>\n",
        "    Universal approximation theorems imply that neural networks can represent a wide variety of interesting functions when given appropriate weights.\n",
        "    </div>"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {
        "tags": []
       },
       "source": [
    
    chadhat's avatar
    chadhat committed
        "In the perceptron examples we had set the model parameters (weights and threshold) by hand.\n",
        "\n",
        "This is something we definitely **DO NOT** want to do or even can do for big networks.\n",
        "\n",
        "We want some algorithm to set/learn the model parameters for us!\n",
        "\n",
        "<div class=\"alert alert-block alert-warning\">\n",
        "    <i class=\"fa fa-info-circle\"></i>&nbsp; <strong>Threshold -> bias</strong>  \n",
        "    \n",
    
        "Before we go further we need to introduce one change. In the neural networks literature, the threshold which we saw in the step activation function above is moved to the left side of the equation and is called **bias**.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "$$\n",
        "f = \\left\\{\n",
        "        \\begin{array}{ll}\n",
        "            0 & \\quad weighted\\_sum + bias < 0 \\\\\n",
        "            1 & \\quad weighted\\_sum + bias \\geq 0\n",
        "        \\end{array}\n",
        "       \\quad \\quad  \\mathrm{where}, bias = -threshold\n",
        "    \\right.\n",
        "$$\n",
        "\n",
        "</div>"
       ]
      },
      {
       "cell_type": "markdown",
    
       "metadata": {
        "tags": []
       },
    
    chadhat's avatar
    chadhat committed
       "source": [
    
        "In order to algorithmically set/learn the weights and biases we need to choose an appropriate loss function for the problem at hand and solve an optimization problem.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "\n",
        "### Loss function\n",
        "\n",
        "To learn using an algorithm we need to define a quantity/function which allows us to measure how close or far are the predictions of our network/setup from reality or the supplied labels. This is done by choosing a so-called \"Loss function\" (as in the case for other machine learning algorithms).\n",
        "\n",
        "Once we have this function, we need an algorithm to update the weights of the network such that this loss function decreases. \n",
        "As one can already imagine the choice of an appropriate loss function is critical to the success of the model. \n",
        "\n",
        "Fortunately, for classification and regression (which cover a large variety of problems) these loss functions are well known. \n",
        "\n",
        "**Crossentropy** and **mean squared error** loss functions are often used for standard classification and regression problems, respectively.\n",
        "\n",
        "<div class=\"alert alert-block alert-warning\">\n",
        "    <i class=\"fa fa-info-circle\"></i>&nbsp; As we have seen before, <strong>mean squared error</strong> is defined as \n",
        "\n",
        "\n",
        "$$\n",
        "\\frac{1}{n} \\left((y_1 - \\hat{y}_1)^2 + (y_2 - \\hat{y}_2)^2 + ... + (y_n - \\hat{y}_n)^2 \\right)\n",
        "$$\n",
        "\n",
        "\n",
        "</div>\n",
        "\n",
        "### Gradient based learning\n",
        "\n",
        "As mentioned above, once we have chosen a loss function, we want to solve an **optimization problem** which minimizes this loss by updating the parameters (weights and biases) of the network. This is how the learning takes in a NN, and the \"knowledge\" is stored as the weights and biases.\n",
        "\n",
        "The most popular optimization methods used in Neural Network training are **Gradient-descent (GD)** type methods, such as gradient-descent itself, RMSprop and Adam. \n",
        "\n",
        "**Gradient-descent** uses partial derivatives of the loss function with respect to the network weights and a learning rate to updates the weights such that the loss function decreases and after some iterations reaches its (Global) minimum value.\n",
        "\n",
        "First, the loss function and its derivative are computed at the output node, and this signal is propagated backwards, using the chain rule, in the network to compute the partial derivatives. Hence, this method is called **Backpropagation**.\n",
        "\n",
        "One way to perform a single GD pass is to compute the partial derivatives using **all the samples** in our data, computing average derivatives and using them to update the weights. This is called **Batch gradient descent**. However, in deep learning we mostly work with massive datasets and using batch gradient descent can make the training very slow!\n",
        "\n",
        "The other extreme is to randomly shuffle the dataset and advance a pass of GD with the gradients computed using only **one sample** at a time. This is called **Stochastic gradient descent**.\n",
        "\n",
        "<center>\n",
        "<figure>\n",
        "<img src=\"./images/stochastic-vs-batch-gradient-descent.png\" width=\"600\"/>\n",
        "<figcaption>Source: <a href=\"https://wikidocs.net/3413\">https://wikidocs.net/3413</a></figcaption>\n",
        "</figure>\n",
        "</center>\n",
        "\n",
        "\n",
        "In practice, an approach in-between these two is used. The entire dataset is divided into **m batches** and these are used one by one to compute the derivatives and apply GD. This technique is called **Mini-batch gradient descent**. \n",
        "\n",
        "<div class=\"alert alert-block alert-warning\">\n",
        "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
        "One pass through the entire training dataset is called 1 epoch of training.\n",
        "</p>\n",
        "</div>"
       ]
      },
    
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In practice modified versions of gradient descent such as <a href=https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf>RMSprop</a> and <a href=https://arxiv.org/abs/1412.6980> Adam optimizer</a> are used."
       ]
      },
    
    chadhat's avatar
    chadhat committed
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
    
        "import seaborn as sns\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "plt.figure(figsize=(10, 4))\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "pts = np.arange(-20, 20, 0.1);"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Activation Functions\n",
        "\n",
        "In order to train the network we need to move away from Perceptron's **step** activation function because it can not be used for training using the gradient-descent and back-propagation algorithms among other drawbacks.\n",
        "\n",
        "Non-Linear functions such as:\n",
        "\n",
        "* Sigmoid\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\frac{1}{1+e^{-z}} \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
        "\\end{equation*}"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
    chadhat's avatar
    chadhat committed
        "sns.lineplot(x=pts, y=1 / (1 + np.exp(-pts)));"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "* tanh\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
        "\\end{equation*}\n"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
    chadhat's avatar
    chadhat committed
        "sns.lineplot(x=pts, y=np.tanh(pts * np.pi));"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "* **ReLU (Rectified linear unit)**\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\mathrm{max}(0,z)   \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
        "\\end{equation*}"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
        "pts_relu = [max(0, i) for i in pts]\n",
        "plt.plot(pts, pts_relu);"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    chadhat's avatar
    chadhat committed
        "<div class=\"alert alert-block alert-warning\">\n",
        "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
        "ReLU is very popular and is widely used nowadays. There exist several other variations of ReLU, e.g. \"leaky ReLU\", \"ELU\".\n",
        "</p>\n",
        "</div>"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    chadhat's avatar
    chadhat committed
        "* **Leaky ReLU**\n",
        "\n",
        "\\begin{equation*}\n",
        "f(z) = \\mathrm{max}(\\alpha z,z)   \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias \\text{ and } \\alpha \\text{ (generally) } = 0.01\n",
        "\\end{equation*}"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
    
        "alpha = 0.1  # Large alpha chosen for plotting purposes\n",
        "pts_leakyrelu = [max(alpha * i, i) for i in pts]\n",
        "plt.plot(pts, pts_leakyrelu)\n",
        "plt.xlim(-5, 5)\n",
        "plt.ylim(-1, 5);"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "* **ELU (Exponential linear unit)**\n",
        "\n",
        "\\begin{equation*}\n",
        "  f_{z} =\n",
        "    \\begin{cases}\n",
        "      \\alpha(\\exp(z)-1) & z<0\\\\\n",
        "      z & z \\geq 0\n",
        "    \\end{cases},\n",
        "       \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias \\text{ and } \\alpha \\text{ (generally) } = 1\n",
        "\\end{equation*}"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
    chadhat's avatar
    chadhat committed
       "source": [
        "import math\n",
    
        "\n",
        "alpha = 1\n",
        "pts_elu = [alpha * (math.exp(i) - 1) if i < 0 else i for i in pts]\n",
        "plt.plot(pts, pts_elu)\n",
        "plt.xlim(-5, 5)\n",
        "plt.ylim(-2, 5);"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "are some of the commonly used as activation functions. Such non-linear activation functions allow the network to learn complex representations of data."
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "## Exercise section - Google Playground\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "A great tool from Google to develop a feeling for the workings of neural networks.\n",
        "\n",
        "https://playground.tensorflow.org/\n",
        "\n",
        "<img src=\"./images/neuralnets/google_playground.png\"/>\n",
        "\n",
    
        "**Walkthrough of the interface by the instructor**\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "Try to set up a simple neural network to solve the circle example:\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "1. Start with a single hidden layer. What are the minimum number of neurons needed to get a reasonable decision boundary?\n",
        "2. Add more neurons and hidden layers while choosing a linear activation function. What do you observe? Why?\n"
    
    chadhat's avatar
    chadhat committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "<div class=\"alert alert-block alert-info\">\n",
        "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
        "Why don't we just use a simple linear activation function?\n",
    
    chadhat's avatar
    chadhat committed
        "    \n",
    
        "Linear activations are **NOT** used because it can be mathematically shown that if they are used then the output is just a linear function of the input. So we cannot learn interesting and complex functions by adding any number of hidden layers.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
    
        "The only exception when we do want to use a linear activation is for the output layer of a network when solving a regression problem.\n",
    
    chadhat's avatar
    chadhat committed
        "\n",
        "</p>\n",
        "</div>"
       ]
      }
     ],
     "metadata": {
      "kernelspec": {
    
    chadhat's avatar
    chadhat committed
       "display_name": "machine-learning-ws-2023",
    
    chadhat's avatar
    chadhat committed
       "language": "python",
    
    chadhat's avatar
    chadhat committed
       "name": "machine-learning-ws-2023"
    
    chadhat's avatar
    chadhat committed
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
    
    chadhat's avatar
    chadhat committed
       "version": "3.10.4"
    
    chadhat's avatar
    chadhat committed
      },
      "latex_envs": {
       "LaTeX_envs_menu_present": true,
       "autoclose": false,
       "autocomplete": true,
       "bibliofile": "biblio.bib",
       "cite_by": "apalike",
       "current_citInitial": 1,
       "eqLabelWithNumbers": true,
       "eqNumInitial": 1,
       "hotkeys": {
        "equation": "Ctrl-E",
        "itemize": "Ctrl-I"
       },
       "labels_anchors": false,
       "latex_user_defs": false,
       "report_style_numbering": false,
       "user_envs_cfg": false
      },
      "toc": {
       "base_numbering": 1,
       "nav_menu": {},
       "number_sections": true,
       "sideBar": true,
       "skip_h1_title": true,
       "title_cell": "Table of Contents",
       "title_sidebar": "Contents",
       "toc_cell": false,
       "toc_position": {},
       "toc_section_display": true,
       "toc_window_display": true
      }
     },
     "nbformat": 4,
    
    chadhat's avatar
    chadhat committed
     "nbformat_minor": 4
    
    chadhat's avatar
    chadhat committed
    }