Skip to content
Snippets Groups Projects
08_d-neural_networks.ipynb 13.5 KiB
Newer Older
  • Learn to ignore specific revisions
  • {
     "cells": [
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
        "from numpy.random import seed\n",
        "\n",
        "seed(42)\n",
        "import tensorflow as tf\n",
        "\n",
        "tf.random.set_seed(42)\n",
        "import matplotlib as mpl\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "sns.set(style=\"darkgrid\")\n",
        "mpl.rcParams[\"lines.linewidth\"] = 3\n",
        "%matplotlib inline\n",
        "%config InlineBackend.figure_format = 'retina'\n",
        "%config IPCompleter.greedy=True\n",
        "import warnings\n",
        "\n",
        "warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
        "from IPython.core.display import HTML\n",
        "\n",
        "HTML(open(\"custom.html\", \"r\").read())"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Chapter 8d: Introduction to Neural Networks\n",
        "## Using pre-defined models in TensorFlow"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "from tensorflow.keras import applications\n",
        "\n",
        "help(applications)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### ImageNet \n",
        "[ImageNet](http://image-net.org/) is a very large (> 14 million!! images) and easily accessible image database. More than 14 million annotated images indicating the object in the image and more than 1 million images with bounding box information.\n",
        "\n",
        "Summary and statistics: http://image-net.org/about-stats\n"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "from tensorflow.keras.applications import VGG16"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "?VGG16"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "model = VGG16(weights=\"imagenet\")"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "model.summary()"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "from IPython.display import Image as Img\n",
        "\n",
        "Img(filename=\"./images/cutepanda.jpg\", width=600)"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "from tensorflow.keras.applications.vgg16 import decode_predictions, preprocess_input\n",
        "from tensorflow.keras.preprocessing.image import img_to_array, load_img\n",
        "\n",
        "image = load_img(\"./images/cutepanda.jpg\", target_size=(224, 224))\n",
        "# convert the image pixels to a numpy array\n",
        "image = img_to_array(image)\n",
        "# Prepare data for the model\n",
        "image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n",
        "image = preprocess_input(image)\n",
        "# prediction of probability of belonging to the output classes\n",
        "prediction = model.predict(image)\n",
        "# converting the probabilities to class labels\n",
        "label = decode_predictions(prediction)\n",
        "# Top 5 classes\n",
        "label = label[0]\n",
        "for pred in label:\n",
        "    # print the classification\n",
        "    print(\"It is: {} with probability {:.4f}%\".format(pred[1], pred[2] * 100))"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Transfering knowledge"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Recap: Convolutional Neural Networks can be seen as being comprised of 2 parts:\n",
        "**A feature extractor (convolution , Maxpooling layers) and a classifier part (Dense layers)**\n",
        "\n",
        "Different possibilities to work with pre-trained/pre-existing models trained on a very large datasets such as Imagenet:\n",
        "\n",
        "* Freezing the convolution part and throwing away the classifer part. Adding your own dense layers and training them.\n",
        "* Freezing only some layers in the convolution part and throwing away the classifer part. Adding your own dense layers and training the unfreezed and the dense layers.\n",
        "* Only using the architecture and training the whole network again."
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Realistic example\n",
        "\n",
        "### Histopathological Cancer Detection\n",
        "\n",
        "https://www.kaggle.com/c/histopathologic-cancer-detection/overview\n",
        "\n",
    
        "**Download data**: https://polybox.ethz.ch/index.php/s/ADUFBnaxbNXAxX4\n",
        "\n",
    
        "Identification of metastatic cancer in small image patches taken from larger digital pathology scans."
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "%matplotlib inline\n",
        "# Plotting a few images from this dataset\n",
        "import os\n",
        "\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "from numpy import random\n",
        "from PIL import Image\n",
        "\n",
        "random.seed(42)\n",
        "import tensorflow as tf\n",
        "\n",
        "tf.random.set_seed(42)\n",
        "\n",
        "\n",
        "def plot_data(samples, top_dir):\n",
        "    sub_directories = [\"benign\", \"malign\"]\n",
        "    fig, ax = plt.subplots(\n",
        "        len(sub_directories),\n",
        "        samples,\n",
        "        sharex=True,\n",
        "        sharey=True,\n",
        "        figsize=(3 * samples, 3 * len(sub_directories)),\n",
        "    )\n",
        "    labels = [\"0\", \"1\"]\n",
        "    assert len(sub_directories) == 2\n",
        "    for i in range(samples):\n",
        "        for j, k in enumerate(sub_directories):\n",
    
        "            tmp = os.path.join(top_dir, k)\n",
        "            tmp_img = Image.open(os.path.join(tmp, random.choice(os.listdir(tmp))))\n",
    
        "            ax[j, i].imshow(np.asarray(tmp_img))\n",
        "            ax[j, i].set_title(\"{}: label={}\".format(k, j))\n",
    
        "            ax[j, i].grid(False)\n",
    
    chadhat's avatar
    chadhat committed
        "#data_dir = \"PATH_TO_histopathologic_cancer_detection_FOLDER\"\n",
        "data_dir = \"./data/histopathologic_cancer_detection/\"\n",
    
        "plot_data(4, os.path.join(data_dir, \"train\"))"
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "# Data preprocessing\n",
        "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
        "\n",
        "train_data = ImageDataGenerator(rescale=1 / 255.0)\n",
        "\n",
    
        "train_directory = os.path.join(data_dir, \"train\")\n",
    
        "train_data_generator = train_data.flow_from_directory(\n",
        "    train_directory, target_size=(96, 96), batch_size=256, class_mode=\"binary\"\n",
        ")\n",
        "\n",
        "validation_data = ImageDataGenerator(rescale=1 / 255.0)\n",
    
        "validation_directory = os.path.join(data_dir, \"validation\")\n",
    
        "validation_data_generator = validation_data.flow_from_directory(\n",
        "    validation_directory, target_size=(96, 96), batch_size=256, class_mode=\"binary\"\n",
        ")"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "from tensorflow.keras import layers, models, optimizers\n",
        "from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "from tensorflow.keras.applications import VGG16"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "feature_extractor = VGG16(weights=None, include_top=False, input_shape=(96, 96, 3))\n",
        "# feature_extractor = MobileNetV2(weights=None, include_top=False, input_shape=(96,96,3))\n",
        "feature_extractor.summary()"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "model = models.Sequential()\n",
        "model.add(feature_extractor)\n",
        "model.add(layers.Flatten())\n",
        "model.add(layers.Dropout(0.2))\n",
        "model.add(layers.Dense(512, activation=\"relu\"))\n",
        "model.add(layers.Dense(1, activation=\"sigmoid\"))"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "model.summary()"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "model.compile(\n",
    
    chadhat's avatar
    chadhat committed
        "    optimizer=optimizers.RMSprop(learning_rate=0.0001),\n",
    
        "    loss=\"binary_crossentropy\",\n",
        "    metrics=[\"accuracy\"],\n",
        ")"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "num_epochs = 10\n",
        "reduce_lr = ReduceLROnPlateau(\n",
        "    monitor=\"val_loss\", factor=0.2, patience=2, min_lr=0.000001\n",
        ")\n",
        "mcp_save = ModelCheckpoint(\"./test/\", save_freq=\"epoch\")"
       ]
      },
      {
       "cell_type": "code",
    
       "execution_count": null,
       "metadata": {},
       "outputs": [],
    
       "source": [
        "# CPU times: user 1h 21min 11s, sys: 17min 41s, total: 1h 38min 53s\n",
        "# Wall time: 1h 58min 20s wo dropout\n",
    
        "model_run = model.fit(\n",
    
        "    train_data_generator,\n",
        "    steps_per_epoch=len(train_data_generator),\n",
        "    epochs=num_epochs,\n",
        "    validation_data=validation_data_generator,\n",
        "    validation_steps=len(validation_data_generator),\n",
        "    callbacks=[reduce_lr, mcp_save],\n",
        ")"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "import pickle\n",
    
        "# with open(\"./data/histopathology_run_history\", \"wb\") as filehandler:\n",
        "#    pickle.dump(model_run.history, filehandler)"
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
        "history_file = open(\"./data/histopathology_run_history\", \"rb\")\n",
        "history = pickle.load(history_file)\n",
    
        "num_epochs = 10\n",
        "plt.plot(\n",
        "    np.arange(0, num_epochs),\n",
    
        "    history[\"val_accuracy\"],\n",
    
        "    label=\"Validation accuracy\",\n",
        ")\n",
    
        "plt.plot(np.arange(0, num_epochs), history[\"accuracy\"], label=\"Train accuracy\")\n",
    
        "plt.xlabel(\"epoch\")\n",
        "plt.ylabel(\"Accuracy\")\n",
        "plt.legend()\n",
        "plt.ylim([0.6, 1])\n",
        "plt.grid()"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "# Data Augmentation\n",
        "train_data = ImageDataGenerator(\n",
        "    rescale=1 / 255.0,\n",
        "    rotation_range=90,\n",
        "    width_shift_range=0.0,\n",
        "    height_shift_range=0.0,\n",
        "    shear_range=0.1,\n",
        "    horizontal_flip=True,\n",
        "    fill_mode=\"nearest\",\n",
        ")\n",
        "# Visualizing what our data generator is doing\n",
        "# Choosing an image randomly\n",
        "from numpy import random\n",
        "\n",
        "pic_malignant = np.asarray(\n",
        "    Image.open(\n",
        "        train_directory\n",
        "        + \"/malign/\"\n",
        "        + random.choice(os.listdir(train_directory + \"/malign/\"))\n",
        "    )\n",
        ")\n",
        "fig, ax = plt.subplots(1, 8, sharex=True, sharey=True, figsize=(3 * 8, 3))\n",
        "ax = ax.flatten()\n",
        "ax[0].imshow(pic_malignant)\n",
    
        "pic_malignant = pic_malignant[np.newaxis, :]\n",
        "for i, img in enumerate(train_data.flow(pic_malignant)):\n",
        "    ax[i + 1].imshow(img[0])\n",
    
        "    ax[i + 1].grid(False)\n",
    
        "    if i == 6:\n",
        "        break"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## TensorFlow Hub\n",
        "\n",
        "A great repository of trained machine learning models!\n",
        "\n",
        "The models can be downloaded and used with just a few lines of code.\n",
        "\n",
        "Find models here: https://tfhub.dev/"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "!pip install --upgrade tensorflow_hub"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "import tensorflow_hub as hub"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
       "outputs": [],
       "source": [
        "layer = hub.KerasLayer(\n",
        "    \"https://tfhub.dev/google/imagenet/resnet_v2_50/classification/4\", trainable=True\n",
        ")"
       ]
      },
      {
       "cell_type": "code",
    
    chadhat's avatar
    chadhat committed
       "execution_count": null,
    
       "metadata": {},
    
    chadhat's avatar
    chadhat committed
       "outputs": [],
    
       "source": [
        "from tensorflow.keras.models import Sequential\n",
        "\n",
        "model = Sequential([layer])\n",
        "model.build([None, 224, 224, 3])\n",
        "model.summary()"
       ]
    
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": []
    
      }
     ],
     "metadata": {
      "kernelspec": {
    
    chadhat's avatar
    chadhat committed
       "display_name": "machine-learning-ws-2023",
    
       "language": "python",
    
    chadhat's avatar
    chadhat committed
       "name": "machine-learning-ws-2023"
    
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
    
    chadhat's avatar
    chadhat committed
       "version": "3.10.4"
    
      }
     },
     "nbformat": 4,
     "nbformat_minor": 4
    }