08_b-neural_networks.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
    "from numpy.random import seed\n",
    "\n",
    "seed(42)\n",
    "import tensorflow as tf\n",
    "\n",
    "tf.random.set_seed(46)\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "sns.set(style=\"darkgrid\")\n",
    "mpl.rcParams[\"lines.linewidth\"] = 3\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "%config IPCompleter.greedy=True\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
    "from IPython.core.display import HTML\n",
    "\n",
    "HTML(open(\"custom.html\", \"r\").read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 8b: Introduction to Tensorflow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction to TensorFlow (keras API)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A bit about Keras?\n",
    "\n",
    "* It is a high level API to create and work with neural networks\n",
    "* Used to support multiple backends such as **TensorFlow** from Google, **Theano** (Theano is dead now) and **CNTK** (Microsoft Cognitive Toolkit), up till release 2.3.0 \n",
    "* Very good for creating neural nets quickly and hides away a lot of tedious work\n",
    "* Has been incorporated into official TensorFlow (which obviously only works with tensorflow) and is its main API as of version 2.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/neural_net_keras_1.svg\" width=\"700\"/>\n",
    "<figcaption>Building this model in TensorFlow (Keras)</figcaption>\n",
    "</figure>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Say hello to Tensorflow\n",
    "from tensorflow.keras.layers import Activation, Dense\n",
    "from tensorflow.keras.models import Sequential\n",
    "\n",
    "# Creating a model\n",
    "model = Sequential()\n",
    "\n",
    "# Adding layers to this model\n",
    "# 1st Hidden layer\n",
    "# A Dense/fully-connected layer which takes as input a\n",
    "# feature array of shape (samples, num_features)\n",
    "# Here input_shape = (2,) means that the layer expects an input with num_features = 2\n",
    "# and the sample size could be anything\n",
    "# The activation function for this layer is set to \"relu\"\n",
    "model.add(Dense(units=4, input_shape=(2,), activation=\"relu\"))\n",
    "\n",
    "# 2nd Hidden layer\n",
    "# This is also a fully-connected layer and we do not need to specify the\n",
    "# shape of the input anymore (We need to do that only for the first layer)\n",
    "# NOTE: Now we didn't add the activation seperately. Instead we just added it\n",
    "# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
    "model.add(Dense(units=4, activation=\"relu\"))\n",
    "\n",
    "\n",
    "# The output layer\n",
    "model.add(Dense(units=1))\n",
    "model.add(Activation(\"sigmoid\"))\n",
    "\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### XOR using neural networks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from sklearn.model_selection import train_test_split\n",
    "from tensorflow.keras.layers import Dense\n",
    "from tensorflow.keras.models import Sequential"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating a network to solve the XOR problem\n",
    "\n",
    "# Loading and plotting the data\n",
    "xor = pd.read_csv(\"data/xor.csv\")\n",
    "\n",
    "# Using x and y coordinates as featues\n",
    "features = xor.iloc[:, :-1]\n",
    "# Convert boolean to integer values (True->1 and False->0)\n",
    "labels = 1 - xor.iloc[:, -1].astype(int)\n",
    "\n",
    "colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.xlim([-2, 2])\n",
    "plt.ylim([-2, 2])\n",
    "plt.title(\"Blue points are False\")\n",
    "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Building a simple Tensorflow model\n",
    "\n",
    "\n",
    "def a_simple_NN():\n",
    "\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(4, input_shape=(2,), activation=\"relu\"))\n",
    "\n",
    "    model.add(Dense(4, activation=\"relu\"))\n",
    "\n",
    "    model.add(Dense(1, activation=\"sigmoid\"))\n",
    "\n",
    "    model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
    "\n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Instantiating the model\n",
    "model = a_simple_NN()\n",
    "\n",
    "# Splitting the dataset into training (70%) and validation sets (30%)\n",
    "X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3)\n",
    "\n",
    "# Setting the number of passes through the entire training set\n",
    "num_epochs = 300\n",
    "\n",
    "# model.fit() is used to train the model\n",
    "# We can pass validation data while training\n",
    "model_run = model.fit(\n",
    "    X_train, y_train, epochs=num_epochs, validation_data=(X_test, y_test)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i>&nbsp;\n",
    "    NOTE: We can pass \"verbose=0\" to model.fit() to suppress the printing of model output on the terminal/notebook.\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plotting the loss and accuracy on the training and validation sets during the training\n",
    "# This can be done by using TensorFlow (Keras) callback \"history\" which is applied by default\n",
    "history_model = model_run.history\n",
    "\n",
    "print(\"The history has the following data: \", history_model.keys())\n",
    "\n",
    "# Plotting the training and validation accuracy during the training\n",
    "sns.lineplot(\n",
    "    x=model_run.epoch, y=history_model[\"accuracy\"], color=\"blue\", label=\"Training set\"\n",
    ")\n",
    "sns.lineplot(\n",
    "    x=model_run.epoch, y=history_model[\"val_accuracy\"], color=\"red\", label=\"Valdation set\"\n",
    ")\n",
    "plt.xlabel(\"epochs\")\n",
    "plt.ylabel(\"accuracy\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "The plots such as above are essential for analyzing the behaviour and performance of the network and to tune it in the right direction. However, for the example above we don't expect to derive a lot of insight from this plot as the function we are trying to fit is quite simple and there is not too much noise. We will see the significance of these curves in a later example.\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Before we move on forward we see how to save and load a TensorFlow (keras) model\n",
    "model.save(\"./data/my_first_NN.h5\")\n",
    "model.save(\"./data/my_first_NN\")\n",
    "\n",
    "\n",
    "# Optional: See what is in the hdf5 file we just created above\n",
    "\n",
    "from tensorflow.keras.models import load_model\n",
    "\n",
    "model = load_model(\"./data/my_first_NN.h5\")\n",
    "model_pb = load_model(\"./data/my_first_NN\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the training and validation in the example above we split our dataset into a 70-30 train-validation set. We know from previous chapters that to more robustly estimate the accuracy of our model we can use **K-fold cross-validation**.\n",
    "This is even more important when we have small datasets and cannot afford to reserve a validation set!\n",
    "\n",
    "One way to do the cross-validation here would be to write our own function to do this. However, we also know that **scikit-learn** provides several handy functions to evaluate and tune the models. So the question is:\n",
    "\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "    Can we somehow use the scikit-learn functions or the ones we wrote ourselves for scikit-learn models to evaluate and tune our TensorFlow (Keras) models?\n",
    "\n",
    "\n",
    "The Answer is **YES !**\n",
    "</p>\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "We show how to do this in the following section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using scikit-learn functions on TensorFlow (Keras) models\n",
    "\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "TensorFlow (Keras) offers 2 wrappers which allow its Sequential models to be used with scikit-learn. \n",
    "\n",
    "There are: **KerasClassifier** and **KerasRegressor**.\n",
    "\n",
    "For more information:\n",
    "https://keras.io/scikit-learn-api/\n",
    "</p>\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "**Now lets see how this works!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We wrap the TensorFlow (Keras) model we created above with KerasClassifier\n",
    "from sklearn.model_selection import cross_val_score\n",
    "#from tensorflow.keras.wrappers.scikit_learn import KerasClassifier\n",
    "from scikeras.wrappers import KerasClassifier\n",
    "\n",
    "# Wrapping TensorFlow (Keras) model\n",
    "# NOTE: We pass verbose=0 to suppress the model output\n",
    "num_epochs = 400\n",
    "model_scikit = KerasClassifier(model=a_simple_NN, epochs=num_epochs, verbose=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's reuse the function to visualize the decision boundary which we saw in chapter 2 with minimal change\n",
    "\n",
    "\n",
    "#def list_flatten(list_of_list):\n",
    "#    flattened_list = [i for j in list_of_list for i in j]\n",
    "#    return flattened_list\n",
    "\n",
    "def list_flatten(list_of_list):\n",
    "  for item in list_of_list:\n",
    "    if isinstance(item, list):\n",
    "      for subitem in item: yield subitem\n",
    "    else:\n",
    "      yield item\n",
    "\n",
    "\n",
    "def plot_points(plt=plt, marker=\"o\"):\n",
    "    colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
    "    plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker=marker)\n",
    "\n",
    "\n",
    "def train_and_plot_decision_surface(\n",
    "    name, classifier, features_2d, labels, preproc=None, plt=plt, marker=\"o\", N=400\n",
    "):\n",
    "\n",
    "    features_2d = np.array(features_2d)\n",
    "    xmin, ymin = features_2d.min(axis=0)\n",
    "    xmax, ymax = features_2d.max(axis=0)\n",
    "\n",
    "    x = np.linspace(xmin, xmax, N)\n",
    "    y = np.linspace(ymin, ymax, N)\n",
    "    points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n",
    "\n",
    "    if preproc is not None:\n",
    "        points_for_classifier = preproc.fit_transform(points)\n",
    "        features_2d = preproc.fit_transform(features_2d)\n",
    "    else:\n",
    "        points_for_classifier = points\n",
    "\n",
    "    classifier.fit(features_2d, labels, verbose=0)\n",
    "\n",
    "    if name == \"Neural Net\":\n",
    "        # predicted = classifier.predict(features_2d)\n",
    "        # predicted = list_flatten(predicted)\n",
    "        predicted = list(list_flatten(\n",
    "            (classifier.predict(features_2d) > 0.5).astype(\"int32\")\n",
    "        ))\n",
    "    # else:\n",
    "    # predicted = classifier.predict(features_2d)\n",
    "\n",
    "    if preproc is not None:\n",
    "        name += \" (w/ preprocessing)\"\n",
    "    print(name + \":\\t\", sum(predicted == labels), \"/\", len(labels), \"correct\")\n",
    "\n",
    "    if name == \"Neural Net\":\n",
    "        # classes = np.array(list_flatten(classifier.predict(points_for_classifier)), dtype=bool)\n",
    "        classes = np.array(\n",
    "            list(list_flatten(\n",
    "                (classifier.predict(points_for_classifier) > 0.5).astype(\"int32\")\n",
    "            )),\n",
    "            dtype=bool,\n",
    "        )\n",
    "    # else:\n",
    "    # classes = np.array(classifier.predict(points_for_classifier), dtype=bool)\n",
    "    plt.plot(\n",
    "        points[~classes][:, 0],\n",
    "        points[~classes][:, 1],\n",
    "        \"o\",\n",
    "        color=\"steelblue\",\n",
    "        markersize=1,\n",
    "        alpha=0.01,\n",
    "    )\n",
    "    plt.plot(\n",
    "        points[classes][:, 0],\n",
    "        points[classes][:, 1],\n",
    "        \"o\",\n",
    "        color=\"chocolate\",\n",
    "        markersize=1,\n",
    "        alpha=0.04,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, ax = plt.subplots(figsize=(6, 6))\n",
    "\n",
    "train_and_plot_decision_surface(\"Neural Net\", model_scikit, features, labels, plt=ax)\n",
    "plot_points(plt=ax)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Applying K-fold cross-validation\n",
    "# Here we pass the whole dataset, i.e. features and labels, instead of splitting it.\n",
    "num_folds = 5\n",
    "cross_validation = cross_val_score(\n",
    "    model_scikit, features, labels, cv=num_folds, verbose=0\n",
    ")\n",
    "\n",
    "print(\"The acuracy on the \", num_folds, \" validation folds:\", cross_validation)\n",
    "print(\n",
    "    \"The Average acuracy on the \",\n",
    "    num_folds,\n",
    "    \" validation folds:\",\n",
    "    np.mean(cross_validation),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "The code above took quite long to finish even though we used only 5  CV folds and the neural network and data size are very small! This gives an indication of the enormous compute requirements of training production-grade deep neural networks.\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hyperparameter optimization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We know from chapter 6 that there are 2 types of parameters which need to be tuned for a machine learning model.\n",
    "* Internal model parameters (weights) which can be learned for e.g. by gradient-descent\n",
    "* Hyperparameters\n",
    "\n",
    "In the model created above we made some arbitrary choices such as the choice of the optimizer we used, optimizer's learning rate, number of hidden units and so on ...\n",
    "\n",
    "Now that we have the TensorFlow (keras) model wrapped as a scikit-learn model we can use the grid search functions we have seen in chapter 6."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import GridSearchCV\n",
    "\n",
    "# Just to remember\n",
    "model_scikit = KerasClassifier(\n",
    "    model=a_simple_NN, **{\"epochs\": num_epochs, \"verbose\": 0}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "HP_grid = {\"epochs\": [30, 50, 100]}\n",
    "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
    "search.fit(features, labels)\n",
    "print(search.best_score_, search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "HP_grid = {\"epochs\": [10, 15, 30], \"batch_size\": [10, 20, 30]}\n",
    "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
    "search.fit(features, labels)\n",
    "print(search.best_score_, search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A more general model for further Hyperparameter optimization\n",
    "from tensorflow.keras import optimizers\n",
    "\n",
    "\n",
    "def a_simple_NN(activation=\"relu\", num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
    "\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[0], input_shape=(2,), activation=activation))\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
    "\n",
    "    model.add(Dense(1, activation=\"sigmoid\"))\n",
    "\n",
    "    model.compile(\n",
    "        loss=\"binary_crossentropy\",\n",
    "        optimizer=optimizers.RMSprop(learning_rate=learning_rate),\n",
    "        metrics=[\"accuracy\"],\n",
    "    )\n",
    "\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### keras-tuner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q -U keras-tuner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def a_simple_NN(activation=\"relu\", num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
    "\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[0], input_shape=(2,), activation=activation))\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
    "\n",
    "    model.add(Dense(1, activation=\"sigmoid\"))\n",
    "\n",
    "    model.compile(\n",
    "        loss=\"binary_crossentropy\",\n",
    "        optimizer=optimizers.RMSprop(learning_rate=learning_rate),\n",
    "        metrics=[\"accuracy\"],\n",
    "    )\n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import keras_tuner as kt\n",
    "\n",
    "\n",
    "def model_builder(hp):\n",
    "\n",
    "    # Tune the number of units in the first Dense layer\n",
    "    hp_units = hp.Int(\"units\", min_value=4, max_value=8, step=2)\n",
    "    hp_units_2 = hp.Int(\"units2\", min_value=4, max_value=16, step=2)\n",
    "\n",
    "    # Tune the learning rate for the optimizer\n",
    "    hp_learning_rate = hp.Choice(\"learning_rate\", values=[1e-2, 1e-3, 1e-4])\n",
    "\n",
    "    # Tune the choice of the activation function\n",
    "    activation = hp.Choice(name=\"activation\", values=[\"relu\", \"sigmoid\"])\n",
    "\n",
    "    model = a_simple_NN(activation, [hp_units, hp_units_2], hp_learning_rate)\n",
    "\n",
    "    return model\n",
    "\n",
    "\n",
    "# The argument ‘hp’ is an instance of the class HyperParameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tuner = kt.BayesianOptimization(\n",
    "    model_builder,\n",
    "    objective=\"val_accuracy\",\n",
    "    max_trials=10,\n",
    "    project_name=\"intro_to_kt\",\n",
    "    overwrite=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tuner.search(X_train, y_train, epochs=100, validation_data=(X_test, y_test))\n",
    "best_model = tuner.get_best_models()[0]\n",
    "print(tuner.get_best_hyperparameters()[0].values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise section:  \n",
    "1. Create a neural network to classify the 2d points example from chapter 2 (Optional: As you create the model read a bit on the different TensorFlow (keras) commands we have used)\n",
    "2. Plot the decision boundary\n",
    "3. Choose and optimize a couple of hyperparameters\n",
    "4. **OPTIONAL:** What function from scikit-learn other than GridSearchCV can we use for hyperparameter optimization? Use it (or use the equivalent method from keras-tuner)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split\n",
    "from tensorflow.keras import optimizers\n",
    "from tensorflow.keras.layers import Dense\n",
    "from tensorflow.keras.models import Sequential\n",
    "#from tensorflow.keras.wrappers.scikit_learn import KerasClassifier\n",
    "from scikeras.wrappers import KerasClassifier\n",
    "import keras_tuner as kt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "circle = pd.read_csv(\"data/circle.csv\")\n",
    "# Using x and y coordinates as featues\n",
    "features = circle.iloc[:, :-1]\n",
    "# Convert boolean to integer values (True->1 and False->0)\n",
    "labels = circle.iloc[:, -1].astype(int)\n",
    "\n",
    "colors = [[\"steelblue\", \"chocolate\"][i] for i in circle[\"label\"]]\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.xlim([-2, 2])\n",
    "plt.ylim([-2, 2])\n",
    "\n",
    "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Insert Code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "# Solution\n",
    "def circle_NN(activation=\"relu\", learning_rate=0.01):\n",
    "\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(4, input_shape=(2,), activation=activation))\n",
    "\n",
    "    model.add(Dense(4, activation=activation))\n",
    "\n",
    "    model.add(Dense(1, activation=\"sigmoid\"))\n",
    "\n",
    "    model.compile(\n",
    "        loss=\"binary_crossentropy\",\n",
    "        optimizer=optimizers.RMSprop(learning_rate=learning_rate),\n",
    "        metrics=[\"accuracy\"],\n",
    "    )\n",
    "\n",
    "    return model\n",
    "\n",
    "\n",
    "# Instantiating the model\n",
    "model = circle_NN()\n",
    "\n",
    "# Splitting the dataset into training (70%) and validation sets (30%)\n",
    "X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3)\n",
    "\n",
    "# Setting the number of passes through the entire training set\n",
    "num_epochs = 400\n",
    "\n",
    "# model.fit() is used to train the model\n",
    "# We can pass validation data while training\n",
    "model_run = model.fit(\n",
    "    X_train, y_train, epochs=num_epochs, validation_data=(X_test, y_test), verbose=0\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "# solution\n",
    "_, ax = plt.subplots(figsize=(6, 6))\n",
    "\n",
    "num_epochs = 400\n",
    "circle_scikit = KerasClassifier(model=circle_NN, epochs=num_epochs, verbose=0)\n",
    "\n",
    "train_and_plot_decision_surface(\"Neural Net\", circle_scikit, features, labels, plt=ax)\n",
    "plot_points(plt=ax)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "# solution (older method)\n",
    "\"\"\"\n",
    "HP_grid = {\n",
    "    \"activation\": [\"relu\", \"sigmoid\"],\n",
    "    \"learning_rate\": [0.01, 0.005, 0.001],\n",
    "}\n",
    "search = GridSearchCV(estimator=circle_scikit, param_grid=HP_grid)\n",
    "search.fit(features, labels)\n",
    "print(search.best_score_, search.best_params_)\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "def model_builder(hp):\n",
    "    # Tune the learning rate for the optimizer\n",
    "    hp_learning_rate = hp.Choice(\"learning_rate\", values=[1e-2, 1e-3, 1e-4])\n",
    "    # Tune the choice of the activation function\n",
    "    activation = hp.Choice(name=\"activation\", values=[\"relu\", \"sigmoid\"])\n",
    "\n",
    "    model = circle_NN(activation, hp_learning_rate)\n",
    "\n",
    "    return model\n",
    "\n",
    "\n",
    "tuner = kt.BayesianOptimization(\n",
    "    model_builder,\n",
    "    objective=\"val_accuracy\",\n",
    "    max_trials=10,\n",
    "    project_name=\"circle_exercise\",\n",
    "    overwrite=True,\n",
    ")\n",
    "tuner.search(X_train, y_train, epochs=400, validation_data=(X_test, y_test))\n",
    "best_model = tuner.get_best_models()[0]\n",
    "print(tuner.get_best_hyperparameters()[0].values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "Another library which you should definitely look at for doing hyperparameter optimization with keras models is the <a href=\"https://github.com/maxpumperla/hyperas\">Hyperas library</a> which is a wrapper around the <a href=\"https://github.com/hyperopt/hyperopt\">Hyperopt library</a>. \n",
    "\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The examples we saw above are really nice to show various features of the TensorFlow (Keras) library and to understand how we build and train a model. However, they are not the ideal problems one should solve using neural networks. They are too simple and can be solved easily by classical machine learning algorithms. \n",
    "\n",
    "Now we show examples where Neural Networks really shine over classical machine learning algorithms."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handwritten Digits Classification (multi-class classification)\n",
    "**MNIST Dataset**\n",
    "\n",
    "MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
    "\n",
    "\n",
    "> The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a > test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.\n",
    "> It is a good database for people who want to try learning techniques and pattern recognition methods on real-world \n",
    "> data while spending minimal efforts on preprocessing and formatting.\n",
    "> source: http://yann.lecun.com/exdb/mnist/\n",
    "\n",
    "This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. \n",
    "This is a multi-class classification problem. Unlike binary classification which we have seen till now we will classify data into 10 different classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading the dataset in TensorFlow (keras)\n",
    "# Later you can explore and play with other datasets with come with TensorFlow (Keras)\n",
    "from tensorflow.keras.datasets import mnist\n",
    "\n",
    "# Loading the train and test data\n",
    "\n",
    "(X_train, y_train), (X_test, y_test) = mnist.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Looking at the dataset\n",
    "print(X_train.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
    "i = np.random.randint(0, X_train.shape[0])\n",
    "sns.set_style(\"white\")\n",
    "plt.imshow(X_train[i], cmap=\"gray_r\")\n",
    "sns.set(style=\"darkgrid\")\n",
    "print(\"This digit is: \", y_train[i])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Look at the data values for a couple of images\n",
    "print(X_train[0].min(), X_train[1].max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data consists of values between 0-255 representing the **grayscale level**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The labels are the digit on the image\n",
    "print(y_train.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scaling the data\n",
    "# It is important to normalize the input data to (0-1) before providing it to a neural net\n",
    "# We could use the previously introduced function from scikit-learn. However, here it is sufficient to\n",
    "# just divide the input data by 255\n",
    "X_train_norm = X_train / 255.0\n",
    "X_test_norm = X_test / 255.0\n",
    "\n",
    "# Also we need to reshape the input data such that each sample is a vector and not a 2D matrix\n",
    "X_train_prep = X_train_norm.reshape(X_train_norm.shape[0], 28 * 28)\n",
    "X_test_prep = X_test_norm.reshape(X_test_norm.shape[0], 28 * 28)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
    "One-Hot encoding\n",
    "\n",
    "In multi-class classification problems the labels are provided to the neural network as something called **One-hot encodings**. The categorical labels (0-9 here) are converted to vectors.\n",
    "\n",
    "For the MNIST problem where the data has **10 categories** we will convert every label to a vector of length 10. \n",
    "All the entries of this vector will be zero **except** for the index which is equal to the (integer) value of the label.\n",
    "\n",
    "For example:\n",
    "if label is 4. The one-hot vector will look like **[0 0 0 0 1 0 0 0 0 0]**\n",
    "\n",
    "Fortunately, TensorFlow (Keras) has a built-in function to achieve this and we do not have to write a code for this ourselves.\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras import utils\n",
    "\n",
    "y_train_onehot = utils.to_categorical(y_train, num_classes=10)\n",
    "y_test_onehot = utils.to_categorical(y_test, num_classes=10)\n",
    "\n",
    "print(y_train_onehot.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Building the tensorflow model\n",