Newer
Older
{
"cells": [
{
"cell_type": "code",
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
"source": [
"# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
"from numpy.random import seed\n",
"\n",
"seed(42)\n",
"import tensorflow as tf\n",
"\n",
"tf.random.set_seed(42)\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"sns.set(style=\"darkgrid\")\n",
"mpl.rcParams[\"lines.linewidth\"] = 3\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"%config IPCompleter.greedy=True\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
"from IPython.core.display import HTML\n",
"\n",
"HTML(open(\"custom.html\", \"r\").read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 8d: Introduction to Neural Networks\n",
"## Using pre-defined models in TensorFlow"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"from tensorflow.keras import applications\n",
"\n",
"help(applications)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ImageNet \n",
"[ImageNet](http://image-net.org/) is a very large (> 14 million!! images) and easily accessible image database. More than 14 million annotated images indicating the object in the image and more than 1 million images with bounding box information.\n",
"\n",
"Summary and statistics: http://image-net.org/about-stats\n"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"from tensorflow.keras.applications import VGG16"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"?VGG16"
]
},
{
"cell_type": "code",
"source": [
"model = VGG16(weights=\"imagenet\")"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"model.summary()"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"from IPython.display import Image as Img\n",
"\n",
"Img(filename=\"./images/cutepanda.jpg\", width=600)"
]
},
{
"cell_type": "code",
"metadata": {},
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
"source": [
"from tensorflow.keras.applications.vgg16 import decode_predictions, preprocess_input\n",
"from tensorflow.keras.preprocessing.image import img_to_array, load_img\n",
"\n",
"image = load_img(\"./images/cutepanda.jpg\", target_size=(224, 224))\n",
"# convert the image pixels to a numpy array\n",
"image = img_to_array(image)\n",
"# Prepare data for the model\n",
"image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n",
"image = preprocess_input(image)\n",
"# prediction of probability of belonging to the output classes\n",
"prediction = model.predict(image)\n",
"# converting the probabilities to class labels\n",
"label = decode_predictions(prediction)\n",
"# Top 5 classes\n",
"label = label[0]\n",
"for pred in label:\n",
" # print the classification\n",
" print(\"It is: {} with probability {:.4f}%\".format(pred[1], pred[2] * 100))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transfering knowledge"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Recap: Convolutional Neural Networks can be seen as being comprised of 2 parts:\n",
"**A feature extractor (convolution , Maxpooling layers) and a classifier part (Dense layers)**\n",
"\n",
"Different possibilities to work with pre-trained/pre-existing models trained on a very large datasets such as Imagenet:\n",
"\n",
"* Freezing the convolution part and throwing away the classifer part. Adding your own dense layers and training them.\n",
"* Freezing only some layers in the convolution part and throwing away the classifer part. Adding your own dense layers and training the unfreezed and the dense layers.\n",
"* Only using the architecture and training the whole network again."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Realistic example\n",
"\n",
"### Histopathological Cancer Detection\n",
"\n",
"https://www.kaggle.com/c/histopathologic-cancer-detection/overview\n",
"\n",
"**Download data**: https://polybox.ethz.ch/index.php/s/ADUFBnaxbNXAxX4\n",
"\n",
"Identification of metastatic cancer in small image patches taken from larger digital pathology scans."
]
},
{
"cell_type": "code",
"metadata": {},
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
"source": [
"%matplotlib inline\n",
"# Plotting a few images from this dataset\n",
"import os\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from numpy import random\n",
"from PIL import Image\n",
"\n",
"random.seed(42)\n",
"import tensorflow as tf\n",
"\n",
"tf.random.set_seed(42)\n",
"\n",
"\n",
"def plot_data(samples, top_dir):\n",
" sub_directories = [\"benign\", \"malign\"]\n",
" fig, ax = plt.subplots(\n",
" len(sub_directories),\n",
" samples,\n",
" sharex=True,\n",
" sharey=True,\n",
" figsize=(3 * samples, 3 * len(sub_directories)),\n",
" )\n",
" labels = [\"0\", \"1\"]\n",
" assert len(sub_directories) == 2\n",
" for i in range(samples):\n",
" for j, k in enumerate(sub_directories):\n",
" tmp = os.path.join(top_dir, k)\n",
" tmp_img = Image.open(os.path.join(tmp, random.choice(os.listdir(tmp))))\n",
" ax[j, i].imshow(np.asarray(tmp_img))\n",
" ax[j, i].set_title(\"{}: label={}\".format(k, j))\n",
" ax[j, i].grid(False)\n",
"#data_dir = \"PATH_TO_histopathologic_cancer_detection_FOLDER\"\n",
"data_dir = \"./data/histopathologic_cancer_detection/\"\n",
"plot_data(4, os.path.join(data_dir, \"train\"))"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"# Data preprocessing\n",
"from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
"\n",
"train_data = ImageDataGenerator(rescale=1 / 255.0)\n",
"\n",
"train_directory = os.path.join(data_dir, \"train\")\n",
"train_data_generator = train_data.flow_from_directory(\n",
" train_directory, target_size=(96, 96), batch_size=256, class_mode=\"binary\"\n",
")\n",
"\n",
"validation_data = ImageDataGenerator(rescale=1 / 255.0)\n",
"validation_directory = os.path.join(data_dir, \"validation\")\n",
"validation_data_generator = validation_data.flow_from_directory(\n",
" validation_directory, target_size=(96, 96), batch_size=256, class_mode=\"binary\"\n",
")"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from tensorflow.keras import layers, models, optimizers\n",
"from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"from tensorflow.keras.applications import VGG16"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"feature_extractor = VGG16(weights=None, include_top=False, input_shape=(96, 96, 3))\n",
"# feature_extractor = MobileNetV2(weights=None, include_top=False, input_shape=(96,96,3))\n",
"feature_extractor.summary()"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"model = models.Sequential()\n",
"model.add(feature_extractor)\n",
"model.add(layers.Flatten())\n",
"model.add(layers.Dropout(0.2))\n",
"model.add(layers.Dense(512, activation=\"relu\"))\n",
"model.add(layers.Dense(1, activation=\"sigmoid\"))"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"model.summary()"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"model.compile(\n",
" optimizer=optimizers.RMSprop(learning_rate=0.0001),\n",
" loss=\"binary_crossentropy\",\n",
" metrics=[\"accuracy\"],\n",
")"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"num_epochs = 10\n",
"reduce_lr = ReduceLROnPlateau(\n",
" monitor=\"val_loss\", factor=0.2, patience=2, min_lr=0.000001\n",
")\n",
"mcp_save = ModelCheckpoint(\"./test/\", save_freq=\"epoch\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# CPU times: user 1h 21min 11s, sys: 17min 41s, total: 1h 38min 53s\n",
"# Wall time: 1h 58min 20s wo dropout\n",
"model_run = model.fit(\n",
" train_data_generator,\n",
" steps_per_epoch=len(train_data_generator),\n",
" epochs=num_epochs,\n",
" validation_data=validation_data_generator,\n",
" validation_steps=len(validation_data_generator),\n",
" callbacks=[reduce_lr, mcp_save],\n",
")"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"# with open(\"./data/histopathology_run_history\", \"wb\") as filehandler:\n",
"# pickle.dump(model_run.history, filehandler)"
]
},
{
"cell_type": "code",
"metadata": {},
"history_file = open(\"./data/histopathology_run_history\", \"rb\")\n",
"history = pickle.load(history_file)\n",
"num_epochs = 10\n",
"plt.plot(\n",
" np.arange(0, num_epochs),\n",
" history[\"val_accuracy\"],\n",
" label=\"Validation accuracy\",\n",
")\n",
"plt.plot(np.arange(0, num_epochs), history[\"accuracy\"], label=\"Train accuracy\")\n",
"plt.xlabel(\"epoch\")\n",
"plt.ylabel(\"Accuracy\")\n",
"plt.legend()\n",
"plt.ylim([0.6, 1])\n",
"plt.grid()"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"# Data Augmentation\n",
"train_data = ImageDataGenerator(\n",
" rescale=1 / 255.0,\n",
" rotation_range=90,\n",
" width_shift_range=0.0,\n",
" height_shift_range=0.0,\n",
" shear_range=0.1,\n",
" horizontal_flip=True,\n",
" fill_mode=\"nearest\",\n",
")\n",
"# Visualizing what our data generator is doing\n",
"# Choosing an image randomly\n",
"from numpy import random\n",
"\n",
"pic_malignant = np.asarray(\n",
" Image.open(\n",
" train_directory\n",
" + \"/malign/\"\n",
" + random.choice(os.listdir(train_directory + \"/malign/\"))\n",
" )\n",
")\n",
"fig, ax = plt.subplots(1, 8, sharex=True, sharey=True, figsize=(3 * 8, 3))\n",
"ax = ax.flatten()\n",
"ax[0].imshow(pic_malignant)\n",
"ax[0].grid(False)\n",
"pic_malignant = pic_malignant[np.newaxis, :]\n",
"for i, img in enumerate(train_data.flow(pic_malignant)):\n",
" ax[i + 1].imshow(img[0])\n",
" ax[i + 1].grid(False)\n",
" if i == 6:\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TensorFlow Hub\n",
"\n",
"A great repository of trained machine learning models!\n",
"\n",
"The models can be downloaded and used with just a few lines of code.\n",
"\n",
"Find models here: https://tfhub.dev/"
]
},
{
"cell_type": "code",
"source": [
"!pip install --upgrade tensorflow_hub"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import tensorflow_hub as hub"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"layer = hub.KerasLayer(\n",
" \"https://tfhub.dev/google/imagenet/resnet_v2_50/classification/4\", trainable=True\n",
")"
]
},
{
"cell_type": "code",
"source": [
"from tensorflow.keras.models import Sequential\n",
"\n",
"model = Sequential([layer])\n",
"model.build([None, 224, 224, 3])\n",
"model.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",