{ "cells": [ { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<style>\n", " \n", " @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n", " \n", " @import url('http://fonts.googleapis.com/css?family=Kameron');\n", " @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n", " \n", " @import url('http://fonts.googleapis.com/css?family=Lato');\n", " @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n", " \n", " @import url('http://fonts.googleapis.com/css?family=Lora'); \n", "\n", " \n", " body {\n", " font-family: 'Lora', Consolas, sans-serif;\n", " \n", " -webkit-print-color-adjust: exact important !;\n", " \n", " \n", " \n", " }\n", " \n", " .alert-block {\n", " width: 95%;\n", " margin: auto;\n", " }\n", " \n", " .rendered_html code\n", " {\n", " color: black;\n", " background: #eaf0ff;\n", " background: #f5f5f5; \n", " padding: 1pt;\n", " font-family: 'Source Code Pro', Consolas, monocco, monospace;\n", " }\n", " \n", " p {\n", " line-height: 140%;\n", " }\n", " \n", " strong code {\n", " background: red;\n", " }\n", " \n", " .rendered_html strong code\n", " {\n", " background: #f5f5f5;\n", " }\n", " \n", " .CodeMirror pre {\n", " font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n", " }\n", " \n", " .cm-s-ipython span.cm-keyword {\n", " font-weight: normal;\n", " }\n", " \n", " strong {\n", " background: #f5f5f5;\n", " margin-top: 4pt;\n", " margin-bottom: 4pt;\n", " padding: 2pt;\n", " border: 0.5px solid #a0a0a0;\n", " font-weight: bold;\n", " color: darkred;\n", " }\n", " \n", " \n", " div #notebook {\n", " # font-size: 10pt; \n", " line-height: 145%;\n", " }\n", " \n", " li {\n", " line-height: 145%;\n", " }\n", "\n", " div.output_area pre {\n", " background: #fff9d8 !important;\n", " padding: 5pt;\n", " \n", " -webkit-print-color-adjust: exact; \n", " \n", " }\n", " \n", " \n", " \n", " h1, h2, h3, h4 {\n", " font-family: Kameron, arial;\n", "\n", "\n", " }\n", " \n", " div#maintoolbar {display: none !important;}\n", "</style>\n" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n", "import matplotlib.pyplot as plt\n", "import matplotlib as mpl\n", "mpl.rcParams['lines.linewidth'] = 3\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'retina'\n", "%config IPCompleter.greedy=True\n", "import warnings\n", "warnings.filterwarnings('ignore', category=FutureWarning)\n", "from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Neural Networks\n", "\n", "\n", "## History of Neural networks\n", "\n", "<div class=\"alert alert-block alert-danger\"><p>\n", " <strong>TODO</strong>: Make it more complete and format properly\n", "</p></div>\n", "\n", "1943 - Threshold Logic\n", "\n", "1940s - Hebbian Learning\n", "\n", "1958 - Perceptron\n", "\n", "1975 - Backpropagation\n", "\n", "1980s - Neocognitron\n", "\n", "1982 - Hopfield Network\n", "\n", "1986 - Convolutional Neural Networks\n", "\n", "1997 - Long-short term memory (LSTM) model\n", "\n", "2014 - Gated Recurrent Units, Generative Adversarial Networks(Check)?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feed-Forward neural network\n", "<center>\n", "<figure>\n", "<img src=\"./images/neuralnets/neural_net_ex.svg\" width=\"700\"/>\n", "<figcaption>A 3 layer densely connected Neural Network (By convention the input layer is not counted).</figcaption>\n", "</figure>\n", "</center>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why the boom now?\n", "* Data\n", "* Data\n", "* Data\n", "* Availability of GPUs\n", "* Algorithmic developments which allow for efficient training and making networks networks\n", "* Development of high-level libraries/APIs have made the field much more accessible than it was a decade ago" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Building blocks\n", "### Perceptron\n", "\n", "The smallest unit of a neural network is a **perceptron** like node.\n", "\n", "**What is a Perceptron?**\n", "\n", "It is a simple function which can have multiple inputs and has a single output.\n", "\n", "<center>\n", "<figure>\n", "<img src=\"./images/neuralnets/perceptron_ex.svg\" width=\"400\"/>\n", "<figcaption>A simple perceptron with 3 inputs and 1 output.</figcaption>\n", "</figure>\n", "</center>\n", "\n", "\n", "It works as follows: \n", "\n", "Step 1: A **weighted sum** of the inputs is calculated\n", "\n", "\\begin{equation*}\n", "weighted\\_sum = w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3} + ...\n", "\\end{equation*}\n", "\n", "Step 2: A **step** activation function is applied\n", "\n", "$$\n", "f(weighted\\_sum) = \\left\\{\n", " \\begin{array}{ll}\n", " 0 & \\quad weighted\\_sum < threshold \\\\\n", " 1 & \\quad weighted\\_sum \\geq threshold\n", " \\end{array}\n", " \\right.\n", "$$\n", "\n", "You can see that this is also a linear classifier as the ones we introduced in script 02." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hidecode" ] }, "outputs": [], "source": [ "# Plotting the step function\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import numpy as np\n", "x = np.arange(-2,2.1,0.01)\n", "y = np.zeros(len(x))\n", "threshold = 0.\n", "y[x>threshold] = 1.\n", "step_plot = sns.lineplot(x, y).set_title('Step function') ;\n", "plt.xlabel('weighted_sum') ;\n", "plt.ylabel('f(weighted_sum)') ;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "def perceptron(X, w, threshold=1):\n", " # This function computes sum(w_i*x_i) and\n", " # applies a perceptron activation\n", " linear_sum = np.dot(np.asarray(X).T, w)\n", " output = np.zeros(len(linear_sum), dtype=np.int8)\n", " output[linear_sum >= threshold] = 1\n", " return output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Boolean AND\n", "\n", "| x$_1$ | x$_2$ | output |\n", "| --- | --- | --- |\n", "| 0 | 0 | 0 |\n", "| 1 | 0 | 0 |\n", "| 0 | 1 | 0 |\n", "| 1 | 1 | 1 |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculating Boolean AND using a perceptron\n", "threshold = 1.5\n", "# (w1, w2)\n", "w = [1, 1]\n", "# (x1, x2) pairs\n", "x1 = [0, 1, 0, 1]\n", "x2 = [0, 0, 1, 1]\n", "# Calling the perceptron function\n", "output = perceptron([x1, x2], w, threshold)\n", "for i in range(len(output)):\n", " print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n", " \" is \", output[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def perceptron_DB(x1, x2, w, threshold):\n", " # Plotting the decision boundary of the perceptron\n", " sns.scatterplot(x1, x2)\n", " plt.xlim(-1,2)\n", " plt.ylim(-1,2)\n", " # The decision boundary is a line given by\n", " # w_1*x_1+w_2*x_2-threshold=0\n", " x1 = np.arange(-3, 4)\n", " x2 = (threshold - x1*w[0])/w[1]\n", " sns.lineplot(x1, x2, **{\"color\": \"black\"})\n", " plt.xlabel(\"x$_1$\", fontsize=16)\n", " plt.ylabel(\"x$_2$\", fontsize=16)\n", " # Coloring the regions\n", " pts_tmp = np.arange(-2, 2.1, 0.02)\n", " points = np.array(np.meshgrid(pts_tmp, pts_tmp)).T.reshape(-1, 2)\n", " outputs = perceptron(points.T, w, threshold)\n", " plt.plot(points[:, 0][outputs == 0], points[:, 1][outputs == 0],\n", " \"o\",\n", " color=\"steelblue\",\n", " markersize=1,\n", " alpha=0.04,\n", " )\n", " plt.plot(points[:, 0][outputs == 1], points[:, 1][outputs == 1],\n", " \"o\",\n", " color=\"chocolate\",\n", " markersize=1,\n", " alpha=0.04,\n", " )\n", " plt.title(\"Blue color = 0 and Chocolate = 1\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the perceptron decision boundary\n", "perceptron_DB(x1, x2, w, threshold)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 1 : Compute a Boolean \"OR\" using a perceptron?**\n", "\n", "Hint: copy the code from the \"AND\" example and edit the weights and/or threshold" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Boolean OR\n", "\n", "| x$_1$ | x$_2$ | output |\n", "| --- | --- | --- |\n", "| 0 | 0 | 0 |\n", "| 1 | 0 | 1 |\n", "| 0 | 1 | 1 |\n", "| 1 | 1 | 1 |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculating Boolean OR using a perceptron\n", "# Edit the code below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution\n", "# Calculating Boolean OR using a perceptron\n", "threshold=0.6\n", "# (w1, w2)\n", "w=[1,1]\n", "# (x1, x2) pairs\n", "x1 = [0, 1, 0, 1]\n", "x2 = [0, 0, 1, 1]\n", "output = perceptron([x1, x2], w, threshold)\n", "for i in range(len(output)):\n", " print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n", " \" is \", output[i])\n", "perceptron_DB(x1, x2, w, threshold)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 2 : Create a NAND gate using a perceptron**\n", "\n", "#### Boolean NAND\n", "\n", "| x$_1$ | x$_2$ | output |\n", "| --- | --- | --- |\n", "| 0 | 0 | 1 |\n", "| 1 | 0 | 1 |\n", "| 0 | 1 | 1 |\n", "| 1 | 1 | 0 |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculating Boolean NAND using a perceptron\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution\n", "# Calculating Boolean NAND using a perceptron\n", "import matplotlib.pyplot as plt\n", "threshold=-1.5\n", "# (w1, w2)\n", "w=[-1,-1]\n", "# (x1, x2) pairs\n", "x1 = [0, 1, 0, 1]\n", "x2 = [0, 0, 1, 1]\n", "output = perceptron([x1, x2], w, threshold)\n", "for i in range(len(output)):\n", " print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n", " \" is \", output[i])\n", "perceptron_DB(x1, x2, w, threshold)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n", "\n", "However, it cannot compute some other boolean functions such as \"XOR\".\n", "\n", "**WHAT CAN WE DO?**\n", "\n", "\n", "Hint: Think about what is the significance of the NAND gate we have created above?\n", "\n", "Answer: We said a single perceptron can't compute a \"XOR\" function. We didn't say that about **multiple Perceptrons** put together." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**XOR function using multiple perceptrons**\n", "\n", "<center>\n", "<figure>\n", "<img src=\"./images/neuralnets/perceptron_XOR.svg\" width=\"400\"/>\n", "<figcaption>Multiple perceptrons connected together to output a XOR function.</figcaption>\n", "</figure>\n", "</center>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multi-layer perceptrons\n", "\n", "The normal densely connected neural network is sometimes also called \"Multi-layer\" perceptron." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Google Playground\n", "\n", "A great tool from Google to develop a feeling about neural networks.\n", "\n", "https://playground.tensorflow.org/\n", "\n", "<img src=\"./images/neuralnets/google_playground.png\"/>\n", "\n", "Some concepts to look at:\n", "\n", "* Effect of activation functions\n", "* Effect of network size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning\n", "\n", "Now we know that we can compute complex functions by combining a number of perceptrons.\n", "\n", "In the perceptron examples we had set the model parameters (weights and thresholds) by hand.\n", "\n", "This is something we definitely **DO NOT** want to do or even can do for big networks.\n", "\n", "We want some algorithm to set the weights for us!\n", "\n", "This is achieved by choosing an appropriate loss function for the problem at hand and solving an optimization problem.\n", "We will explain below what this means.\n", "\n", "\n", "### Loss function\n", "\n", "To learn using an algorithm we need to define a quantity/function which allows us to measure how close or far are the predictions of our network/setup from reality or the supplied labels. This is done by choosing a so-called \"Loss function\" (as in the case for other machine learning algorithms).\n", "\n", "Once we have this function, we need an algorithm to update the weights of the network such that this loss function decreases. \n", "As one can already imagine the choice of an appropriate loss function is critical to the success of the model. \n", "\n", "Fortunately, for classification and regression (which cover a large variety of problems) these loss functions are well known. \n", "\n", "Generally **crossentropy** and **mean squared error** loss functions are used for classification and regression problems, respectively.\n", "\n", "<div class=\"alert alert-block alert-warning\">\n", " <i class=\"fa fa-info-circle\"></i> <strong>mean squared error</strong> is defined as \n", "\n", "\n", "$$\n", "\\frac{1}{n} \\left((y_1 - \\hat{y}_1)^2 + (y_2 - \\hat{y}_2)^2 + ... + (y_n - \\hat{y}_n)^2 \\right)\n", "$$\n", "\n", "\n", "</div>\n", "\n", "### Gradient based learning\n", "\n", "As mentioned above, once we have chosen a loss function, we want to solve an **optimization problem** which minimizes this loss by updating the weights of the network. This is how the learning takes in a NN, and the \"knowledge\" is stored in the weights.\n", "\n", "The most popular optimization methods used in Neural Network training are **Gradient-descent (GD)** type methods, such as gradient-descent itself, RMSprop and Adam. \n", "\n", "**Gradient-descent** uses partial derivatives of the loss function with respect to the network weights and a learning rate to updates the weights such that the loss function decreases and after some iterations reaches its (Global) minimum value.\n", "\n", "First, the loss function and its derivative are computed at the output node, and this signal is propagated backwards, using the chain rule, in the network to compute the partial derivatives. Hence, this method is called **Backpropagation**.\n", "\n", "One way to perform a single GD pass is to compute the partial derivatives using **all the samples** in our data, computing average derivatives and using them to update the weights. This is called **Batch gradient descent**. However, in deep learning we mostly work with massive datasets and using batch gradient descent can make the training very slow!\n", "\n", "The other extreme is to randomly shuffle the dataset and advance a pass of GD with the gradients computed using only **one sample** at a time. This is called **Stochastic gradient descent**.\n", "\n", "<center>\n", "<figure>\n", "<img src=\"stochastic-vs-batch-gradient-descent.png\" width=\"600\"/>\n", "<figcaption>Source: <a href=\"https://wikidocs.net/3413\">https://wikidocs.net/3413</a></figcaption>\n", "</figure>\n", "</center>\n", "\n", "\n", "In practice, an approach in-between these two is used. The entire dataset is divided into **m batches** and these are used one by one to compute the derivatives and apply GD. This technique is called **Mini-batch gradient descent**. \n", "\n", "<div class=\"alert alert-block alert-warning\">\n", "<p><i class=\"fa fa-warning\"></i> \n", "One pass through the entire training dataset is called 1 epoch of training.\n", "</p>\n", "</div>" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import numpy as np\n", "\n", "plt.figure(figsize=(10, 4)) ;\n", "\n", "pts=np.arange(-20,20, 0.1) ;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Activation Functions\n", "\n", "In order to train the network we need to move away from Perceptron's **step** activation function because it does not allow training using the gradient-descent and back-propagation algorithms among other drawbacks.\n", "\n", "Non-Linear functions such as:\n", "\n", "* Sigmoid\n", "\n", "\\begin{equation*}\n", "f(z) = \\frac{1}{1+e^{-z}}\n", "\\end{equation*}" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "image/png": { "height": 250, "width": 373 }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.lineplot(pts, 1/(1+np.exp(-pts))) ;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* tanh\n", "\n", "\\begin{equation*}\n", "f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\n", "\\end{equation*}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "image/png": { "height": 250, "width": 388 }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.lineplot(pts, np.tanh(pts*np.pi)) ;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **ReLU (Rectified linear unit)**\n", "\n", "\\begin{equation*}\n", "f(z) = \\mathrm{max}(0,z)\n", "\\end{equation*}" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "image/png": { "height": 250, "width": 380 }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "pts_relu=[max(0,i) for i in pts];\n", "plt.plot(pts, pts_relu) ;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "are some of the commonly used as activation functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<div class=\"alert alert-block alert-warning\">\n", "<p><i class=\"fa fa-warning\"></i> \n", "ReLU is very popular and is widely used nowadays. There also exist other variations of ReLU, e.g. \"leaky ReLU\".\n", "</p>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<div class=\"alert alert-block alert-info\">\n", "<p><i class=\"fa fa-warning\"></i> \n", "Why don't we just use a simple linear activation function?\n", " \n", "Linear activations are **NOT** used because it can be mathematically shown that if they are used then the output is just a linear function of the input. So we cannot learn interesting and complex functions by adding any number of hidden layers.\n", "\n", "The only exception when we do want to use a linear activation is for the output layer of a network when solving a regression problem.\n", "\n", "</p>\n", "</div>\n", "\n", "\n", "\n", "Non-linear activation functions allow the network to learn complex representations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Keras" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is Keras?\n", "\n", "* It is a high level API to create and work with neural networks\n", "* Supports multiple backends such as TensorFlow from Google, Theano (Although Theano is dead now) and CNTK (Microsoft Cognitive Toolkit)\n", "* Very good for creating neural nets very quickly and hides away a lot of tedious work\n", "* Has been incorporated into official TensorFlow (which obviously only works with tensforflow) and as of TensorFlow 2.0 this will the main api to use TensorFlow (check reference)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Say hello to keras\n", "\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Activation\n", "\n", "# Creating a model\n", "model = Sequential()\n", "\n", "# Adding layers to this model\n", "# 1st Hidden layer\n", "# A Dense/fully-connected layer which takes as input a \n", "# feature array of shape (samples, num_features)\n", "# Here input_shape = (8,) means that the layer expects an input with num_features = 8 \n", "# and the sample size could be anything\n", "# Then we specify an activation function\n", "model.add(Dense(units=4, input_shape=(8,)))\n", "model.add(Activation(\"relu\"))\n", "\n", "# 2nd Hidden layer\n", "# This is also a fully-connected layer and we do not need to specify the\n", "# shape of the input anymore (We need to do that only for the first layer)\n", "# NOTE: Now we didn't add the activation seperately. Instead we just added it\n", "# while calling Dense(). This and the way used for the first layer are Equivalent!\n", "model.add(Dense(units=4, activation=\"relu\"))\n", "\n", " \n", "# The output layer\n", "model.add(Dense(units=1))\n", "model.add(Activation(\"sigmoid\"))\n", "\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### XOR using neural networks" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "from keras.models import Sequential\n", "from keras.layers import Dense\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creating a network to solve the XOR problem\n", "\n", "# Loading and plotting the data\n", "xor = pd.read_csv(\"xor.csv\")\n", "\n", "# Using x and y coordinates as featues\n", "features = xor.iloc[:, :-1]\n", "# Convert boolean to integer values (True->1 and False->0)\n", "labels = xor.iloc[:, -1].astype(int)\n", "\n", "colors = [[\"steelblue\", \"chocolate\"][i] for i in xor[\"label\"]]\n", "plt.figure(figsize=(5, 5))\n", "plt.xlim([-2, 2])\n", "plt.ylim([-2, 2])\n", "plt.title(\"Blue points are False\")\n", "\n", "\n", "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\") ;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Building a Keras model\n", "\n", "def a_simple_NN():\n", " \n", " model = Sequential()\n", "\n", " model.add(Dense(4, input_shape = (2,), activation = \"relu\"))\n", "\n", " model.add(Dense(4, activation = \"relu\"))\n", "\n", " model.add(Dense(1, activation = \"sigmoid\"))\n", "\n", " model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n", " \n", " return model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Instantiating the model\n", "model = a_simple_NN()\n", "\n", "# Splitting the dataset into training (70%) and validation sets (30%)\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " features, labels, test_size=0.3)\n", "\n", "# Setting the number of passes through the entire training set\n", "num_epochs = 300\n", "\n", "# We can pass validation data while training\n", "model_run = model.fit(X_train, y_train, epochs=num_epochs,\n", " validation_data=(X_test, y_test))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Looking at the loss and accuracy on the training and validation sets during the training\n", "# This can be done by using Keras callback \"history\" which is applied by default\n", "history_model = model_run.history\n", "\n", "print(\"The history has the following data: \", history_model.keys())\n", "\n", "# Plotting the training and validation accuracy during the training\n", "plt.plot(np.arange(1, num_epochs+1), history_model[\"acc\"], \"blue\") ;\n", "\n", "plt.plot(np.arange(1, num_epochs+1), history_model[\"val_acc\"], \"red\") ;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Here we dont't really see a big difference between the training and validation data because the function we are trying to fit is quiet simple and there is not too much noise. We will come back to these curves in a later example**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the example above we splitted our dataset into a 70-30 train-validation set. We know from previous chapters that to more robustly calculate accuracy we can use **K-fold crossvalidation**.\n", "This is even more important when we have small datasets and cannot afford to reserve a validation set!\n", "\n", "One way to do the cross validation here would be to write our own function to do this. However, we also know that **SciKit learn** provides several handy functions to evaluate and tune the models. So the question is:\n", "\n", "Can we somehow use these **Scikit learn** functions or ones we wrote ourselves for **Scikit learn** models to evaluate and tune our Keras models?\n", "\n", "The Answer is **YES !**\n", "\n", "We show how to do this in the following section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using SciKit learn functions on Keras models\n", "\n", "Keras offers 2 wrappers which allow its Sequential models to be used with SciKit learn. \n", "\n", "There are: **KerasClassifier** and **KerasRegressor**.\n", "\n", "For more information:\n", "https://keras.io/scikit-learn-api/\n", "\n", "**Now lets see how this works!**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We wrap the Keras model we created above with KerasClassifier\n", "from keras.wrappers.scikit_learn import KerasClassifier\n", "from sklearn.model_selection import cross_val_score\n", "# Wrapping Keras model\n", "# NOTE: We pass verbose=0 to suppress the model output\n", "num_epochs = 400\n", "model_scikit = KerasClassifier(\n", " build_fn=a_simple_NN, **{\"epochs\": num_epochs, \"verbose\": 0})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's reuse the function to visualize the decision boundary which we saw in chapter 2 with minimal change\n", "\n", "def list_flatten(list_of_list):\n", " flattened_list = [i for j in list_of_list for i in j]\n", " return flattened_list\n", "\n", "def plot_points(plt=plt, marker='o'):\n", " colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n", " plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker=marker);\n", "\n", "def train_and_plot_decision_surface(\n", " name, classifier, features_2d, labels, preproc=None, plt=plt, marker='o', N=400\n", "):\n", "\n", " features_2d = np.array(features_2d)\n", " xmin, ymin = features_2d.min(axis=0)\n", " xmax, ymax = features_2d.max(axis=0)\n", "\n", " x = np.linspace(xmin, xmax, N)\n", " y = np.linspace(ymin, ymax, N)\n", " points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n", "\n", " if preproc is not None:\n", " points_for_classifier = preproc.fit_transform(points)\n", " features_2d = preproc.fit_transform(features_2d)\n", " else:\n", " points_for_classifier = points\n", "\n", " classifier.fit(features_2d, labels, verbose=0)\n", " predicted = classifier.predict(features_2d)\n", " \n", " if name == \"Neural Net\":\n", " predicted = list_flatten(predicted)\n", " \n", " \n", " if preproc is not None:\n", " name += \" (w/ preprocessing)\"\n", " print(name + \":\\t\", sum(predicted == labels), \"/\", len(labels), \"correct\")\n", " \n", " if name == \"Neural Net\":\n", " classes = np.array(list_flatten(classifier.predict(points_for_classifier)), dtype=bool)\n", " else:\n", " classes = np.array(classifier.predict(points_for_classifier), dtype=bool)\n", " plt.plot(\n", " points[~classes][:, 0],\n", " points[~classes][:, 1],\n", " \"o\",\n", " color=\"steelblue\",\n", " markersize=1,\n", " alpha=0.01,\n", " )\n", " plt.plot(\n", " points[classes][:, 0],\n", " points[classes][:, 1],\n", " \"o\",\n", " color=\"chocolate\",\n", " markersize=1,\n", " alpha=0.04,\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_, ax = plt.subplots(figsize=(6, 6))\n", "\n", "train_and_plot_decision_surface(\"Neural Net\", model_scikit, features, labels, plt=ax)\n", "plot_points(plt=ax)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Applying K-fold cross-validation\n", "# Here we pass the whole dataset, i.e. features and labels, instead of splitting it.\n", "num_folds = 5\n", "cross_validation = cross_val_score(\n", " model_scikit, features, labels, cv=num_folds, verbose=0)\n", "\n", "print(\"The acuracy on the \", num_folds, \" validation folds:\", cross_validation)\n", "print(\"The Average acuracy on the \", num_folds, \" validation folds:\", np.mean(cross_validation))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NOTE: The above code took quiet long even though we used only 5 CV folds and the neural network and data size are very small!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hyperparameter optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We know from chapter 6 that there are 2 types of parameters which need to be tuned for a machine learning model.\n", "* Internal model parameters (weights) which can be learned for e.g. by gradient-descent\n", "* Hyperparameters\n", "\n", "In the model which we created above we made some arbitrary choices like which optimizer we use, what is its learning rate, number of hidden units and so on ...\n", "\n", "Now that we have the keras model wrapped as a scikit model we can use the grid search functions we have seen in chapter 6." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import GridSearchCV\n", "# Just to remember\n", "model_scikit = KerasClassifier(\n", " build_fn=a_simple_NN, **{\"epochs\": num_epochs, \"verbose\": 0})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HP_grid = {'epochs' : [300, 500, 1000]}\n", "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n", "search.fit(features, labels)\n", "print(search.best_score_, search.best_params_)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HP_grid = {'epochs' : [10, 15, 30], \n", " 'batch_size' : [10, 20, 30] }\n", "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n", "search.fit(features, labels)\n", "print(search.best_score_, search.best_params_)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A more general model for further Hyperparameter optimization\n", "from keras import optimizers\n", "\n", "def a_simple_NN(activation='relu', num_hidden_neurons=[4, 4], learning_rate=0.01):\n", "\n", " model = Sequential()\n", "\n", " model.add(Dense(num_hidden_neurons[0],\n", " input_shape=(2,), activation=activation))\n", "\n", " model.add(Dense(num_hidden_neurons[1], activation=activation))\n", "\n", " model.add(Dense(1, activation=\"sigmoid\"))\n", "\n", " model.compile(loss=\"binary_crossentropy\", optimizer=optimizers.rmsprop(\n", " lr=learning_rate), metrics=[\"accuracy\"])\n", "\n", " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: \n", "* Look at the model above and choose a couple of hyperparameters to optimize. \n", "* **(OPTIONAL:)** What function from SciKit learn other than GridSearchCV can we use for hyperparameter optimization? Use it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: Create a neural network to classify the 2d points example from chapter 2 learned \n", "(Optional: As you create the model read a bit on the different keras commands we have used)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "circle = pd.read_csv(\"2d_points.csv\")\n", "# Using x and y coordinates as featues\n", "features = circle.iloc[:, :-1]\n", "# Convert boolean to integer values (True->1 and False->0)\n", "labels = circle.iloc[:, -1].astype(int)\n", "\n", "colors = [[\"steelblue\", \"chocolate\"][i] for i in circle[\"label\"]]\n", "plt.figure(figsize=(5, 5))\n", "plt.xlim([-2, 2])\n", "plt.ylim([-2, 2])\n", "\n", "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Insert Code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The examples above are not the ideal use problems one should use neural networks for. They are too simple and can be easily solved by classical machine learning algorithms. Below we show examples which are the more common applications of Neural Networks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Handwritten Digits Classification\n", "### MNIST Dataset\n", "\n", "MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n", "\n", "\n", ">The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a >test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size->normalized and centered in a fixed-size image.\n", ">It is a good database for people who want to try learning techniques and pattern recognition methods on real-world >data while spending minimal efforts on preprocessing and formatting.\n", ">source: http://yann.lecun.com/exdb/mnist/\n", "\n", "The problem we want to solve using this dataset is: multi-class classification (FIRST TIME)\n", "This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Loading the dataset in keras\n", "# Later you can explore and play with other datasets with come with Keras\n", "from keras.datasets import mnist\n", "\n", "# Loading the train and test data\n", "\n", "(X_train, y_train), (X_test, y_test) = mnist.load_data()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Looking at the dataset\n", "print(X_train.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "i=np.random.randint(0,X_train.shape[0])\n", "plt.imshow(X_train[i], cmap=\"gray_r\") ;\n", "print(\"This digit is: \" , y_train[i])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Look at the data values for a couple of images\n", "print(X_train[0].min(), X_train[1].max())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data consists of values between 0-255 representing the **grayscale level**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The labels are the digit on the image\n", "print(y_train.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Scaling the data\n", "# It is important to normalize the input data to (0-1) before providing it to a neural net\n", "# We could use the previously introduced function from SciKit learn. However, here it is sufficient to\n", "# just divide the input data by 255\n", "X_train_norm = X_train/255.\n", "X_test_norm = X_test/255.\n", "\n", "# Also we need to reshape the input data such that each sample is a vector and not a 2D matrix\n", "X_train_prep = X_train_norm.reshape(X_train_norm.shape[0],28*28)\n", "X_test_prep = X_test_norm.reshape(X_test_norm.shape[0],28*28)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**IMPORTANT: One-Hot encoding**\n", "\n", "**TODO: Better frame the explaination**\n", "\n", "In such problems the labels are provided as something called **One-hot encodings**. What this does is to convert a categorical label to a vector.\n", "\n", "For the MNIST problem where we have **10 categories** one-hot encoding will create a vector of length 10 for each of the labels. All the entries of this vector will be zero **except** for the index which is equal to the integer value of the label.\n", "\n", "For example:\n", "if label is 4. The one-hot vector will look like **[0 0 0 0 1 0 0 0 0 0]**\n", "\n", "Fortunately, we don't have to code this ourselves because Keras has a built-in function for this." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from keras.utils.np_utils import to_categorical\n", "\n", "y_train_onehot = to_categorical(y_train, num_classes=10)\n", "y_test_onehot = to_categorical(y_test, num_classes=10)\n", "\n", "print(y_train_onehot.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Building the keras model\n", "from keras.models import Sequential\n", "from keras.layers import Dense\n", "\n", "def mnist_model():\n", " model = Sequential()\n", "\n", " model.add(Dense(64, input_shape=(28*28,), activation=\"relu\"))\n", "\n", " model.add(Dense(64, activation=\"relu\"))\n", "\n", " model.add(Dense(10, activation=\"softmax\"))\n", "\n", " model.compile(loss=\"categorical_crossentropy\",\n", " optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n", " return model\n", "\n", "model = mnist_model()\n", "\n", "model_run = model.fit(X_train_prep, y_train_onehot, epochs=20,\n", " batch_size=512)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optional exercise: Run the model again with validation dataset, plot the accuracy as a function of epochs, play with number of epochs and observe what is happening." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Code here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution:\n", "num_epochs = 20\n", "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n", " batch_size=512, validation_data=(X_test_prep, y_test_onehot))\n", "# Evaluating the model on test dataset\n", "#print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))\n", "history_model = model_run.history\n", "print(\"The history has the following data: \", history_model.keys())\n", "\n", "# Plotting the training and validation accuracy during the training\n", "plt.plot(np.arange(1, num_epochs+1), history_model[\"acc\"], \"blue\")\n", "\n", "plt.plot(np.arange(1, num_epochs+1), history_model[\"val_acc\"], \"red\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding regularization" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adding l2 regularization\n", "# Building the keras model\n", "from keras.models import Sequential\n", "from keras.layers import Dense\n", "from keras.regularizers import l2\n", "\n", "def mnist_model():\n", " \n", " model = Sequential()\n", "\n", " model.add(Dense(64, input_shape=(28*28,), activation=\"relu\", \n", " kernel_regularizer=l2(0.01)))\n", "\n", " model.add(Dense(64, activation=\"relu\", \n", " kernel_regularizer=l2(0.01)))\n", "\n", " model.add(Dense(10, activation=\"softmax\"))\n", "\n", " model.compile(loss=\"categorical_crossentropy\",\n", " optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n", " return model\n", "\n", "model = mnist_model()\n", "\n", "num_epochs = 50\n", "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n", " batch_size=512)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Another way to add regularization and to make the network more robust we can add something called \"Dropout\". When we add dropout to a layer a specified percentage of units in that layer are switched off. \n", "(MAKING MODEL SIMPLER)\n", "\n", "### Exercise: Add dropout instead of l2 regularization in the network above" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adding dropout is easy in keras\n", "# We import a layer called Dropout and add as follows\n", "# model.add(Dropout(0.5)) to randomly drop 50% of the hidden units\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution\n", "# Adding Dropout\n", "# Building the keras model\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Dropout\n", "\n", "def mnist_model():\n", " \n", " model = Sequential()\n", "\n", " model.add(Dense(64, input_shape=(28*28,), activation=\"relu\"))\n", " \n", " model.add(Dropout(0.4))\n", "\n", " model.add(Dense(64, activation=\"relu\"))\n", "\n", " model.add(Dense(10, activation=\"softmax\"))\n", "\n", " model.compile(loss=\"categorical_crossentropy\",\n", " optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n", " \n", " return model\n", "\n", "model = mnist_model()\n", "\n", "num_epochs = 50\n", "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n", " batch_size=512)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Network Architecture\n", "\n", "The neural networks which we have seen till now are the simplest kind of neural networks.\n", "There exist more sophisticated network architectures especially designed for specific applications.\n", "Some of them are as follows:\n", "\n", "### Convolution Neural Networks (CNNs)\n", "\n", "These networks are used mostly for computer vision (EXAMPLES) like tasks. \n", "One of the old CNN networks is shown below.\n", "\n", "<center>\n", "<figure>\n", "<img src=\"./images/neuralnets/CNN_lecun.png\" width=\"800\"/>\n", "<figcaption>source: LeCun et al., Gradient-based learning applied to document recognition (1998).</figcaption>\n", "</figure>\n", "</center>\n", "\n", "CNNs consist of new type of layers like convolution layer and pooling layers.\n", "\n", "### Recurrent Neural Networks (RNNs)\n", "\n", "These are used for time-series data, speech recognition, translation etc.\n", "\n", "IMAGE HERE\n", "\n", "### Generative adversarial networks (GANs)\n", "\n", "GANs consist of 2 parts, a generative network and a discriminative network. The generative network produces data which is then fed to the discriminative network which judges if the new data belongs to a specified dataset. Then via feedback loops the generative network becomes better and better at creating images similar to the dataset the discriminative network is judging against. At the same time the discriminative network get better and better at identifyig **fake** instances which are not from the reference dataset. \n", "\n", "IMAGE HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CNN example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this example we will work with a dataset called fashion-MNIST which is quite similar to the MNIST data above.\n", "> Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.\n", "source: https://github.com/zalandoresearch/fashion-mnist\n", "\n", "The 10 classes of this dataset are:\n", "\n", "| Label| Item |\n", "| --- | --- |\n", "| 0 |\tT-shirt/top |\n", "| 1\t| Trouser |\n", "|2|\tPullover|\n", "|3|\tDress|\n", "|4|\tCoat|\n", "|5|\tSandal|\n", "|6|\tShirt|\n", "|7|\tSneaker|\n", "|8|\tBag|\n", "|9|\tAnkle boot|" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Loading the dataset in keras\n", "# Later you can explore and play with other datasets with come with Keras\n", "from keras.datasets import fashion_mnist\n", "\n", "# Loading the train and test data\n", "\n", "(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()\n", "\n", "items =['T-shirt/top', 'Trouser', \n", " 'Pullover', 'Dress', \n", " 'Coat', 'Sandal', \n", " 'Shirt', 'Sneaker',\n", " 'Bag', 'Ankle boot']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "i=np.random.randint(0,X_train.shape[0])\n", "plt.imshow(X_train[i], cmap=\"gray_r\") ; \n", "print(\"This item is a: \" , items[y_train[i]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Also we need to reshape the input data such that each sample is a 4D matrix of dimension\n", "# (num_samples, width, height, channels). Even though these images are grayscale we need to add\n", "# channel dimension as this is expected by the Conv function\n", "X_train_prep = X_train.reshape(X_train.shape[0],28,28,1)/255.\n", "X_test_prep = X_test.reshape(X_test.shape[0],28,28,1)/255.\n", "\n", "from keras.utils.np_utils import to_categorical\n", "\n", "y_train_onehot = to_categorical(y_train, num_classes=10)\n", "y_test_onehot = to_categorical(y_test, num_classes=10)\n", "\n", "print(y_train_onehot.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creating a CNN similar to the one shown in the figure from LeCun paper\n", "# In the original implementation Average pooling was used. However, we will use maxpooling as this \n", "# is what us used in the more recent architectures and is found to be a better choice\n", "# Convolution -> Pooling -> Convolution -> Pooling -> Flatten -> Dense -> Dense -> Output layer\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout, BatchNormalization\n", "\n", "def simple_CNN():\n", " \n", " model = Sequential()\n", " \n", " model.add(Conv2D(6, (3,3), input_shape=(28,28,1), activation='relu'))\n", " \n", " model.add(MaxPool2D((2,2)))\n", " \n", " model.add(Conv2D(16, (3,3), activation='relu'))\n", " \n", " model.add(MaxPool2D((2,2)))\n", " \n", " model.add(Flatten())\n", " \n", " model.add(Dense(120, activation='relu'))\n", " \n", " model.add(Dense(84, activation='relu'))\n", " \n", " model.add(Dense(10, activation='softmax'))\n", " \n", " model.compile(loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n", " \n", " return model\n", "\n", "model = simple_CNN()\n", "model.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_epochs = 10\n", "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs, \n", " batch_size=64, validation_data=(X_test_prep, y_test_onehot))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: Use the above model or improve it (change number of filters, add more layers etc. on the MNIST example and see if you can get a better accuracy than what we achieved with a vanilla neural network)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: Load and play with the CIFAR10 dataset also included with Keras and build+train a simple CNN using it" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false } }, "nbformat": 4, "nbformat_minor": 2 }