neural_nets_intro.ipynb

   "source": [
    "HP_grid = {'epochs' : [300, 500, 1000]}\n",
    "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid, n_jobs=3)\n",
    "search.fit(features, labels)\n",
    "print(search.best_score_, search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/tarunchadha/anaconda3/envs/mlw-2/lib/python3.6/site-packages/sklearn/model_selection/_search.py:841: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8119999953508377 {'batch_size': 10, 'epochs': 30}\n"
     ]
    }
   ],
   "source": [
    "HP_grid = {'epochs' : [10, 15, 30], \n",
    "           'batch_size' : [10, 20, 30] }\n",
    "search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid, n_jobs=4)\n",
    "search.fit(features, labels)\n",
    "print(search.best_score_, search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A more general model for further Hyperparameter optimization\n",
    "from keras import optimizers\n",
    "\n",
    "def a_simple_NN(activation='relu', num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
    "\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[0],\n",
    "                    input_shape=(2,), activation=activation))\n",
    "\n",
    "    model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
    "\n",
    "    model.add(Dense(1, activation=\"sigmoid\"))\n",
    "\n",
    "    model.compile(loss=\"binary_crossentropy\", optimizer=optimizers.rmsprop(\n",
    "        lr=learning_rate), metrics=[\"accuracy\"])\n",
    "\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise: \n",
    "* Look at the model above and choose a couple of hyperparameters to optimize. \n",
    "* **(OPTIONAL:)** What function from SciKit learn other than GridSearchCV can we use for hyperparameter optimization? Use it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise: Create a neural network to classify the 2d points example from chapter 2 learned \n",
    "(Optional: As you create the model read a bit on the different keras commands we have used)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split, cross_val_score\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras import optimizers\n",
    "from keras.wrappers.scikit_learn import KerasClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "circle = pd.read_csv(\"2d_points.csv\")\n",
    "# Using x and y coordinates as featues\n",
    "features = circle.iloc[:, :-1]\n",
    "# Convert boolean to integer values (True->1 and False->0)\n",
    "labels = circle.iloc[:, -1].astype(int)\n",
    "\n",
    "colors = [[\"steelblue\", \"chocolate\"][i] for i in circle[\"label\"]]\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.xlim([-2, 2])\n",
    "plt.ylim([-2, 2])\n",
    "\n",
    "plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Insert Code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The examples above are not the ideal use problems one should use neural networks for. They are too simple and can be easily solved by classical machine learning algorithms. Below we show examples which are the more common applications of Neural Networks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handwritten Digits Classification\n",
    "### MNIST Dataset\n",
    "\n",
    "MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
    "\n",
    "\n",
    ">The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a >test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size->normalized and centered in a fixed-size image.\n",
    ">It is a good database for people who want to try learning techniques and pattern recognition methods on real-world >data while spending minimal efforts on preprocessing and formatting.\n",
    ">source: http://yann.lecun.com/exdb/mnist/\n",
    "\n",
    "The problem we want to solve using this dataset is: multi-class classification (FIRST TIME)\n",
    "This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading the dataset in keras\n",
    "# Later you can explore and play with other datasets with come with Keras\n",
    "from keras.datasets import mnist\n",
    "\n",
    "# Loading the train and test data\n",
    "\n",
    "(X_train, y_train), (X_test, y_test) = mnist.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Looking at the dataset\n",
    "print(X_train.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "i=np.random.randint(0,X_train.shape[0])\n",
    "plt.imshow(X_train[i], cmap=\"gray_r\") ;\n",
    "print(\"This digit is: \" , y_train[i])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Look at the data values for a couple of images\n",
    "print(X_train[0].min(), X_train[1].max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data consists of values between 0-255 representing the **grayscale level**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The labels are the digit on the image\n",
    "print(y_train.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scaling the data\n",
    "# It is important to normalize the input data to (0-1) before providing it to a neural net\n",
    "# We could use the previously introduced function from SciKit learn. However, here it is sufficient to\n",
    "# just divide the input data by 255\n",
    "X_train_norm = X_train/255.\n",
    "X_test_norm = X_test/255.\n",
    "\n",
    "# Also we need to reshape the input data such that each sample is a vector and not a 2D matrix\n",
    "X_train_prep = X_train_norm.reshape(X_train_norm.shape[0],28*28)\n",
    "X_test_prep = X_test_norm.reshape(X_test_norm.shape[0],28*28)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**IMPORTANT: One-Hot encoding**\n",
    "\n",
    "**TODO: Better frame the explaination**\n",
    "\n",
    "In such problems the labels are provided as something called **One-hot encodings**. What this does is to convert a categorical label to a vector.\n",
    "\n",
    "For the MNIST problem where we have **10 categories** one-hot encoding will create a vector of length 10 for each of the labels. All the entries of this vector will be zero **except** for the index which is equal to the integer value of the label.\n",
    "\n",
    "For example:\n",
    "if label is 4. The one-hot vector will look like **[0 0 0 0 1 0 0 0 0 0]**\n",
    "\n",
    "Fortunately, we don't have to code this ourselves because Keras has a built-in function for this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.utils.np_utils import to_categorical\n",
    "\n",
    "y_train_onehot = to_categorical(y_train, num_classes=10)\n",
    "y_test_onehot = to_categorical(y_test, num_classes=10)\n",
    "\n",
    "print(y_train_onehot.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Building the keras model\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "\n",
    "def mnist_model():\n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(64, input_shape=(28*28,), activation=\"relu\"))\n",
    "\n",
    "    model.add(Dense(64, activation=\"relu\"))\n",
    "\n",
    "    model.add(Dense(10, activation=\"softmax\"))\n",
    "\n",
    "    model.compile(loss=\"categorical_crossentropy\",\n",
    "                  optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
    "    return model\n",
    "\n",
    "model = mnist_model()\n",
    "\n",
    "model_run = model.fit(X_train_prep, y_train_onehot, epochs=20,\n",
    "                      batch_size=512)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optional exercise: Run the model again with validation dataset, plot the accuracy as a function of epochs, play with number of epochs and observe what is happening."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution:\n",
    "num_epochs = 20\n",
    "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n",
    "                      batch_size=512, validation_data=(X_test_prep, y_test_onehot))\n",
    "# Evaluating the model on test dataset\n",
    "#print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))\n",
    "history_model = model_run.history\n",
    "print(\"The history has the following data: \", history_model.keys())\n",
    "\n",
    "# Plotting the training and validation accuracy during the training\n",
    "sns.lineplot(np.arange(1, num_epochs+1), history_model[\"acc\"], color = \"blue\", label=\"Training set\") ;\n",
    "sns.lineplot(np.arange(1, num_epochs+1), history_model[\"val_acc\"], color = \"red\", label=\"Valdation set\") ;\n",
    "plt.xlabel(\"epochs\") ;\n",
    "plt.ylabel(\"accuracy\") ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding regularization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Adding l2 regularization\n",
    "# Building the keras model\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras.regularizers import l2\n",
    "\n",
    "def mnist_model():\n",
    "    \n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(64, input_shape=(28*28,), activation=\"relu\", \n",
    "                   kernel_regularizer=l2(0.01)))\n",
    "\n",
    "    model.add(Dense(64, activation=\"relu\", \n",
    "                   kernel_regularizer=l2(0.01)))\n",
    "\n",
    "    model.add(Dense(10, activation=\"softmax\"))\n",
    "\n",
    "    model.compile(loss=\"categorical_crossentropy\",\n",
    "                  optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
    "    return model\n",
    "\n",
    "model = mnist_model()\n",
    "\n",
    "num_epochs = 50\n",
    "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n",
    "                      batch_size=512)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Another way to add regularization and to make the network more robust we can add something called \"Dropout\". When we add dropout to a layer a specified percentage of units in that layer are switched off. \n",
    "(MAKING MODEL SIMPLER)\n",
    "\n",
    "### Exercise: Add dropout instead of l2 regularization in the network above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Adding dropout is easy in keras\n",
    "# We import a layer called Dropout and add as follows\n",
    "# model.add(Dropout(0.5)) to randomly drop 50% of the hidden units\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution\n",
    "# Adding Dropout\n",
    "# Building the keras model\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Dropout\n",
    "\n",
    "def mnist_model():\n",
    "    \n",
    "    model = Sequential()\n",
    "\n",
    "    model.add(Dense(64, input_shape=(28*28,), activation=\"relu\"))\n",
    "              \n",
    "    model.add(Dropout(0.4))\n",
    "\n",
    "    model.add(Dense(64, activation=\"relu\"))\n",
    "\n",
    "    model.add(Dense(10, activation=\"softmax\"))\n",
    "\n",
    "    model.compile(loss=\"categorical_crossentropy\",\n",
    "                  optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
    "              \n",
    "    return model\n",
    "\n",
    "model = mnist_model()\n",
    "\n",
    "num_epochs = 50\n",
    "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n",
    "                      batch_size=512)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Network Architecture\n",
    "\n",
    "The neural networks which we have seen till now are the simplest kind of neural networks.\n",
    "There exist more sophisticated network architectures especially designed for specific applications.\n",
    "Some of them are as follows:\n",
    "\n",
    "###  Convolution Neural Networks (CNNs)\n",
    "\n",
    "These networks are used mostly for computer vision (EXAMPLES) like tasks. \n",
    "One of the old CNN networks is shown below.\n",
    "\n",
    "<center>\n",
    "<figure>\n",
    "<img src=\"./images/neuralnets/CNN_lecun.png\" width=\"800\"/>\n",
    "<figcaption>source: LeCun et al., Gradient-based learning applied to document recognition (1998).</figcaption>\n",
    "</figure>\n",
    "</center>\n",
    "\n",
    "CNNs consist of new type of layers like convolution layer and pooling layers.\n",
    "\n",
    "###  Recurrent Neural Networks (RNNs)\n",
    "\n",
    "These are used for time-series data, speech recognition, translation etc.\n",
    "\n",
    "IMAGE HERE\n",
    "\n",
    "### Generative adversarial networks (GANs)\n",
    "\n",
    "GANs consist of 2 parts, a generative network and a discriminative network. The generative network produces data which is then fed to the discriminative network which judges if the new data belongs to a specified dataset. Then via feedback loops the generative network becomes better and better at creating images similar to the dataset the discriminative network is judging against. At the same time the discriminative network get better and better at identifyig **fake** instances which are not from the reference dataset. \n",
    "\n",
    "IMAGE HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CNN example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example we will work with a dataset called fashion-MNIST which is quite similar to the MNIST data above.\n",
    "> Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.\n",
    "source: https://github.com/zalandoresearch/fashion-mnist\n",
    "\n",
    "The 10 classes of this dataset are:\n",
    "\n",
    "| Label| Item |\n",
    "| --- | --- |\n",
    "| 0 |\tT-shirt/top |\n",
    "| 1\t| Trouser |\n",
    "|2|\tPullover|\n",
    "|3|\tDress|\n",
    "|4|\tCoat|\n",
    "|5|\tSandal|\n",
    "|6|\tShirt|\n",
    "|7|\tSneaker|\n",
    "|8|\tBag|\n",
    "|9|\tAnkle boot|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading the dataset in keras\n",
    "# Later you can explore and play with other datasets with come with Keras\n",
    "from keras.datasets import fashion_mnist\n",
    "\n",
    "# Loading the train and test data\n",
    "\n",
    "(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()\n",
    "\n",
    "items =['T-shirt/top', 'Trouser', \n",
    "        'Pullover', 'Dress', \n",
    "        'Coat', 'Sandal', \n",
    "        'Shirt', 'Sneaker',\n",
    "        'Bag', 'Ankle boot']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "i=np.random.randint(0,X_train.shape[0])\n",
    "plt.imshow(X_train[i], cmap=\"gray_r\") ; \n",
    "print(\"This item is a: \" , items[y_train[i]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Also we need to reshape the input data such that each sample is a 4D matrix of dimension\n",
    "# (num_samples, width, height, channels). Even though these images are grayscale we need to add\n",
    "# channel dimension as this is expected by the Conv function\n",
    "X_train_prep = X_train.reshape(X_train.shape[0],28,28,1)/255.\n",
    "X_test_prep = X_test.reshape(X_test.shape[0],28,28,1)/255.\n",
    "\n",
    "from keras.utils.np_utils import to_categorical\n",
    "\n",
    "y_train_onehot = to_categorical(y_train, num_classes=10)\n",
    "y_test_onehot = to_categorical(y_test, num_classes=10)\n",
    "\n",
    "print(y_train_onehot.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating a CNN similar to the one shown in the figure from LeCun paper\n",
    "# In the original implementation Average pooling was used. However, we will use maxpooling as this \n",
    "# is what us used in the more recent architectures and is found to be a better choice\n",
    "# Convolution -> Pooling -> Convolution -> Pooling -> Flatten -> Dense -> Dense -> Output layer\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout, BatchNormalization\n",
    "\n",
    "def simple_CNN():\n",
    "    \n",
    "    model = Sequential()\n",
    "    \n",
    "    model.add(Conv2D(6, (3,3), input_shape=(28,28,1), activation='relu'))\n",
    "    \n",
    "    model.add(MaxPool2D((2,2)))\n",
    "    \n",
    "    model.add(Conv2D(16, (3,3), activation='relu'))\n",
    "    \n",
    "    model.add(MaxPool2D((2,2)))\n",
    "    \n",
    "    model.add(Flatten())\n",
    "    \n",
    "    model.add(Dense(120, activation='relu'))\n",
    "    \n",
    "    model.add(Dense(84, activation='relu'))\n",
    "    \n",
    "    model.add(Dense(10, activation='softmax'))\n",
    "    \n",
    "    model.compile(loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
    "    \n",
    "    return model\n",
    "\n",
    "model = simple_CNN()\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_epochs = 10\n",
    "model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs, \n",
    "                      batch_size=64, validation_data=(X_test_prep, y_test_onehot))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise: Use the above model or improve it (change number of filters, add more layers etc. on the MNIST example and see if you can get a better accuracy than what we achieved with a vanilla neural network)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise: Load and play with the CIFAR10 dataset also included with Keras and build+train a simple CNN using it"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}