08_c-neural_networks.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
    "from numpy.random import seed\n",
    "\n",
    "seed(42)\n",
    "import tensorflow as tf\n",
    "\n",
    "tf.random.set_seed(42)\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "sns.set(style=\"darkgrid\")\n",
    "mpl.rcParams[\"lines.linewidth\"] = 3\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "%config IPCompleter.greedy=True\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
    "from IPython.core.display import HTML\n",
    "\n",
    "HTML(open(\"custom.html\", \"r\").read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 8c: Convolution Neural Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## Network Architectures\n",
    "\n",
    "The neural networks which we have seen till now are the simplest kind of neural networks.\n",
    "There exist more sophisticated network architectures especially designed for specific applications.\n",
    "Some of them are as follows:\n",
    "\n",
    "###  Convolution Neural Networks (CNNs)\n",
    "\n",
    "These networks are used mostly for computer vision like tasks such as image classification and object detection. \n",
    "One of the old CNN networks is shown below.\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/CNN_lecun.png\" width=\"800\"/>\n",
    "<figcaption>source: LeCun et al., Gradient-based learning applied to document recognition (1998).</figcaption>\n",
    "</figure>\n",
    "</center>\n",
    "\n",
    "CNNs consist of new type of layers such as convolution and pooling layers.\n",
    "\n",
    "###  Recurrent Neural Networks (RNNs)\n",
    "\n",
    "RNNs are used for problems such as time-series data, speech recognition and translation.\n",
    "\n",
    "### Generative adversarial networks (GANs)\n",
    "\n",
    "GANs consist of 2 parts, a generative network and a discriminative network. The generative network produces data which is then fed to the discriminative network which judges if the new data belongs to a specified dataset. Then via feedback loops the generative network becomes better and better at creating images similar to the dataset the discriminative network is judging against. At the same time the discriminative network get better and better at identifyig **fake** instances which are not from the reference dataset. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CNN in a bit more detail\n",
    "\n",
    "The standard CNN architecture can be seen as 2 parts:\n",
    "\n",
    "* Feature extraction\n",
    "* Classification\n",
    "\n",
    "For the **classification** part we use the densly connected network as shown in the TensorFlow (keras) examples above.\n",
    "\n",
    "However, for the **feature extraction** part we use new types of layers called **convolution** layers\n",
    "\n",
    "### What is a Convolution?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "\n",
    "sns.set_style(\"white\")\n",
    "# Loading the train and test data\n",
    "digit = np.genfromtxt(\"data/digit_4_14x14.csv\", delimiter=\",\").astype(np.int16)\n",
    "plt.imshow(digit, \"gray_r\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This image in matrix form"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_astable(matrix, hw=0.15):\n",
    "    matrix = plt.table(cellText=matrix, loc=(0, 0), cellLoc=\"center\")\n",
    "    matrix.set_fontsize(14)\n",
    "    cells = matrix.get_celld()\n",
    "    for i in cells:\n",
    "        cells[i].set_height(hw)\n",
    "        cells[i].set_width(hw)\n",
    "    plt.axis(\"off\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_astable(digit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Vertical edge detection\n",
    "vertical_edge_kernel = np.array([[-1, 2, -1], [-1, 2, -1], [-1, 2, -1]])\n",
    "plot_astable(vertical_edge_kernel, 0.2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "\n",
    "def convolution(matrix, kernel):\n",
    "    # This function computes a convolution between a matrix and a kernel/filter without any padding\n",
    "    width_kernel = kernel.shape[0]\n",
    "    height_kernel = kernel.shape[1]\n",
    "    convolution = np.zeros(\n",
    "        (matrix.shape[0] - width_kernel + 1, matrix.shape[1] - height_kernel + 1)\n",
    "    )\n",
    "    for i in range(matrix.shape[0] - width_kernel + 1):\n",
    "        for j in range(matrix.shape[1] - height_kernel + 1):\n",
    "            convolution[i, j] = np.sum(\n",
    "                np.multiply(matrix[i : i + width_kernel, j : j + height_kernel], kernel)\n",
    "            )\n",
    "    return convolution\n",
    "\n",
    "\n",
    "vertical_detect = convolution(digit, vertical_edge_kernel)\n",
    "plt.imshow(vertical_detect, cmap=\"gray_r\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Horizontal edge detection\n",
    "horizontal_edge_kernel = np.array([[-1, -1, -1], [2, 2, 2], [-1, -1, -1]])\n",
    "plot_astable(horizontal_edge_kernel, 0.2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "horizontal_detect = convolution(digit, horizontal_edge_kernel)\n",
    "plt.imshow(horizontal_detect, cmap=\"gray_r\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Maxpooling\n",
    "Taking maximum in n x n sized sliding windows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "\n",
    "def maxpool_2x2(matrix):\n",
    "    out_dim = np.array([matrix.shape[0] / 2, matrix.shape[1] / 2]).astype(int)\n",
    "    subsample = np.zeros((out_dim))\n",
    "    for i in range(out_dim[0]):\n",
    "        for j in range(out_dim[1]):\n",
    "            subsample[i, j] = np.max(matrix[i * 2 : i * 2 + 2, j * 2 : j * 2 + 2])\n",
    "    return subsample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "subsampled_image = maxpool_2x2(vertical_detect)\n",
    "plt.imshow(subsampled_image, cmap=\"gray_r\")\n",
    "plt.title(\"Max Pooled vertical edge detection filter\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subsampled_image = maxpool_2x2(horizontal_detect)\n",
    "plt.imshow(subsampled_image, cmap=\"gray_r\")\n",
    "plt.title(\"Max Pooled horizontal edge detection filter\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's explore some more of such filters/kernels!!\n",
    "\n",
    "http://setosa.io/ev/image-kernels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CNN Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example we will work with a dataset called fashion-MNIST which is quite similar to the MNIST data above.\n",
    "> Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.\n",
    "source: https://github.com/zalandoresearch/fashion-mnist\n",
    "\n",
    "The 10 classes of this dataset are:\n",
    "\n",
    "| Label| Item |\n",
    "| --- | --- |\n",
    "| 0 |\tT-shirt/top |\n",
    "| 1\t| Trouser |\n",
    "|2|\tPullover|\n",
    "|3|\tDress|\n",
    "|4|\tCoat|\n",
    "|5|\tSandal|\n",
    "|6|\tShirt|\n",
    "|7|\tSneaker|\n",
    "|8|\tBag|\n",
    "|9|\tAnkle boot|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading the dataset in tensorflow\n",
    "# Later you can explore and play with other datasets with come with tensorflow\n",
    "from tensorflow.keras.datasets import fashion_mnist\n",
    "\n",
    "# Loading the train and test data\n",
    "\n",
    "(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()\n",
    "\n",
    "items = [\n",
    "    \"T-shirt/top\",\n",
    "    \"Trouser\",\n",
    "    \"Pullover\",\n",
    "    \"Dress\",\n",
    "    \"Coat\",\n",
    "    \"Sandal\",\n",
    "    \"Shirt\",\n",
    "    \"Sneaker\",\n",
    "    \"Bag\",\n",
    "    \"Ankle boot\",\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "i = np.random.randint(0, X_train.shape[0])\n",
    "plt.imshow(X_train[i], cmap=\"gray_r\")\n",
    "print(\"This item is a: \", items[y_train[i]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Also we need to reshape the input data such that each sample is a 4D matrix of dimension\n",
    "# (num_samples, width, height, channels). Even though these images are grayscale we need to add\n",
    "# channel dimension as this is expected by the Conv function\n",
    "X_train_prep = X_train.reshape(X_train.shape[0], 28, 28, 1) / 255.0\n",
    "X_test_prep = X_test.reshape(X_test.shape[0], 28, 28, 1) / 255.0\n",
    "\n",
    "from tensorflow.keras.utils import to_categorical\n",
    "\n",
    "y_train_onehot = to_categorical(y_train, num_classes=10)\n",
    "y_test_onehot = to_categorical(y_test, num_classes=10)\n",
    "\n",
    "print(y_train_onehot.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating a CNN similar to the one shown in the figure from LeCun paper\n",
    "# In the original implementation Average pooling was used. However, we will use maxpooling as this\n",
    "# is what us used in the more recent architectures and is found to be a better choice\n",
    "# Convolution -> Pooling -> Convolution -> Pooling -> Flatten -> Dense -> Dense -> Output layer\n",
    "from tensorflow.keras.layers import (\n",
    "    BatchNormalization,\n",
    "    Conv2D,\n",
    "    Dense,\n",
    "    Dropout,\n",
    "    Flatten,\n",
    "    MaxPool2D,\n",
    ")\n",
    "from tensorflow.keras.models import Sequential\n",
    "\n",
    "\n",
    "def simple_CNN():\n",
    "\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Conv2D(6, (3, 3), input_shape=(28, 28, 1), activation=\"relu\"))\n",
    "\n",
    "    model.add(MaxPool2D((2, 2)))\n",
    "\n",
    "    model.add(Conv2D(16, (3, 3), activation=\"relu\"))\n",
    "\n",
    "    model.add(MaxPool2D((2, 2)))\n",
    "\n",
    "    model.add(Flatten())\n",
    "\n",
    "    model.add(Dense(120, activation=\"relu\"))\n",
    "\n",
    "    model.add(Dense(84, activation=\"relu\"))\n",
    "\n",
    "    model.add(Dense(10, activation=\"softmax\"))\n",
    "\n",
    "    model.compile(\n",
    "        loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"]\n",
    "    )\n",
    "\n",
    "    return model\n",
    "\n",
    "\n",
    "model = simple_CNN()\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_epochs = 5\n",
    "model_run = model.fit(\n",
    "    X_train_prep,\n",
    "    y_train_onehot,\n",
    "    epochs=num_epochs,\n",
    "    batch_size=64,\n",
    "    validation_data=(X_test_prep, y_test_onehot),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (optional) Exercise section\n",
    "* Use the above model or improve it (change number of filters, add more layers etc. on the MNIST example and see if you can get a better accuracy than what we achieved with a vanilla neural network)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise section\n",
    "* Explore the CIFAR10 (https://www.cs.toronto.edu/~kriz/cifar.html) dataset included with TensorFlow (Keras) and build+train a simple CNN to classify it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.datasets import cifar10\n",
    "\n",
    "(X_train, y_train), (X_test, y_test) = cifar10.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (C) 2019-2022 ETH Zurich, SIS ID"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Functional API\n",
    "\n",
    "The Sequential API of TensorFlow (Keras) is good enough for simple models with a linear topology.\n",
    "However, the functional api is more flexible and allows for more complicated use cases such as:\n",
    "* models with non-linear topology\n",
    "* shared layers\n",
    "* multiple inputs or outputs\n",
    "\n",
    "Examples of such models:\n",
    "\n",
    "* U-Net for image segmentation (https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/)\n",
    "* ResNet https://arxiv.org/pdf/1512.03385.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simple example showing the Fashion MNIST example using the functional api\n",
    "from tensorflow.keras.layers import Conv2D, Dense, Flatten, Input, MaxPool2D\n",
    "from tensorflow.keras.models import Model\n",
    "\n",
    "\n",
    "def simple_CNN_functional():\n",
    "\n",
    "    img_inputs = Input(shape=(28, 28, 1))\n",
    "\n",
    "    x = Conv2D(6, (3, 3), activation=\"relu\")(img_inputs)\n",
    "\n",
    "    x = MaxPool2D((2, 2))(x)\n",
    "\n",
    "    x = Conv2D(16, (3, 3), activation=\"relu\")(x)\n",
    "\n",
    "    x = MaxPool2D((2, 2))(x)\n",
    "\n",
    "    x = Flatten()(x)\n",
    "\n",
    "    x = Dense(120, activation=\"relu\")(x)\n",
    "\n",
    "    x = Dense(84, activation=\"relu\")(x)\n",
    "\n",
    "    output = Dense(10, activation=\"softmax\")(x)\n",
    "\n",
    "    model = Model(inputs=img_inputs, outputs=output, name=\"fashion_mnist_model\")\n",
    "\n",
    "    model.compile(\n",
    "        loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"]\n",
    "    )\n",
    "\n",
    "    return model\n",
    "\n",
    "\n",
    "model = simple_CNN_functional()\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_epochs = 5\n",
    "model_run = model.fit(\n",
    "    X_train_prep,\n",
    "    y_train_onehot,\n",
    "    epochs=num_epochs,\n",
    "    batch_size=64,\n",
    "    validation_data=(X_test_prep, y_test_onehot),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Toy ResNet\n",
    "(source: https://keras.io/guides/functional_api/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.layers import (\n",
    "    Conv2D,\n",
    "    Dense,\n",
    "    Dropout,\n",
    "    Flatten,\n",
    "    GlobalAveragePooling2D,\n",
    "    Input,\n",
    "    MaxPool2D,\n",
    "    add,\n",
    ")\n",
    "from tensorflow.keras.models import Model\n",
    "\n",
    "\n",
    "def toy_ResNet():\n",
    "\n",
    "    inputs = Input(shape=(32, 32, 3), name=\"img\")\n",
    "    x = Conv2D(32, 3, activation=\"relu\")(inputs)\n",
    "    x = Conv2D(64, 3, activation=\"relu\")(x)\n",
    "    block_1_output = MaxPool2D(3)(x)\n",
    "\n",
    "    x = Conv2D(64, 3, activation=\"relu\", padding=\"same\")(block_1_output)\n",
    "    x = Conv2D(64, 3, activation=\"relu\", padding=\"same\")(x)\n",
    "    block_2_output = add([x, block_1_output])\n",
    "\n",
    "    x = Conv2D(64, 3, activation=\"relu\", padding=\"same\")(block_2_output)\n",
    "    x = Conv2D(64, 3, activation=\"relu\", padding=\"same\")(x)\n",
    "    block_3_output = add([x, block_2_output])\n",
    "\n",
    "    x = Conv2D(64, 3, activation=\"relu\")(block_3_output)\n",
    "    x = GlobalAveragePooling2D()(x)\n",
    "    x = Dense(256, activation=\"relu\")(x)\n",
    "    x = Dropout(0.5)(x)\n",
    "    outputs = Dense(10, \"softmax\")(x)\n",
    "\n",
    "    model = Model(inputs, outputs, name=\"toy_resnet\")\n",
    "\n",
    "    return model\n",
    "\n",
    "\n",
    "model = toy_ResNet()\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.utils import plot_model\n",
    "\n",
    "plot_model(model, \"mini_resnet.png\", show_shapes=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}