Newer
Older
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
"from numpy.random import seed\n",
"\n",
"seed(42)\n",
"import tensorflow as tf\n",
"\n",
"tf.random.set_seed(42)\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"sns.set(style=\"darkgrid\")\n",
"mpl.rcParams[\"lines.linewidth\"] = 3\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"%config IPCompleter.greedy=True\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
"from IPython.core.display import HTML\n",
"\n",
"HTML(open(\"custom.html\", \"r\").read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 8b: Introduction to Neural Networks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to TensorFlow (keras API)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A bit about Keras?\n",
"\n",
"* It is a high level API to create and work with neural networks\n",
"* Used to support multiple backends such as **TensorFlow** from Google, **Theano** (Theano is dead now) and **CNTK** (Microsoft Cognitive Toolkit), up till release 2.3.0 \n",
"* Very good for creating neural nets quickly and hides away a lot of tedious work\n",
"* Has been incorporated into official TensorFlow (which obviously only works with tensorflow) and is its main API as of version 2.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
"<figure>\n",
"<img src=\"./images/neuralnets/neural_net_keras_1.svg\" width=\"700\"/>\n",
"<figcaption>Building this model in TensorFlow (Keras)</figcaption>\n",
"</figure>\n",
"</center>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Say hello to Tensorflow\n",
"from tensorflow.keras.layers import Activation, Dense\n",
"from tensorflow.keras.models import Sequential\n",
"\n",
"# Creating a model\n",
"model = Sequential()\n",
"\n",
"# Adding layers to this model\n",
"# 1st Hidden layer\n",
"# A Dense/fully-connected layer which takes as input a\n",
"# feature array of shape (samples, num_features)\n",
"# Here input_shape = (2,) means that the layer expects an input with num_features = 2\n",
"# and the sample size could be anything\n",
"# The activation function for this layer is set to \"relu\"\n",
"model.add(Dense(units=4, input_shape=(2,), activation=\"relu\"))\n",
"\n",
"# 2nd Hidden layer\n",
"# This is also a fully-connected layer and we do not need to specify the\n",
"# shape of the input anymore (We need to do that only for the first layer)\n",
"# NOTE: Now we didn't add the activation seperately. Instead we just added it\n",
"# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
"model.add(Dense(units=4, activation=\"relu\"))\n",
"\n",
"\n",
"# The output layer\n",
"model.add(Dense(units=1))\n",
"model.add(Activation(\"sigmoid\"))\n",
"\n",
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### XOR using neural networks"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"from sklearn.model_selection import train_test_split\n",
"from tensorflow.keras.layers import Dense\n",
"from tensorflow.keras.models import Sequential"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Creating a network to solve the XOR problem\n",
"\n",
"# Loading and plotting the data\n",
"xor = pd.read_csv(\"data/xor.csv\")\n",
"\n",
"# Using x and y coordinates as featues\n",
"features = xor.iloc[:, :-1]\n",
"# Convert boolean to integer values (True->1 and False->0)\n",
"labels = 1 - xor.iloc[:, -1].astype(int)\n",
"\n",
"colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
"plt.figure(figsize=(5, 5))\n",
"plt.xlim([-2, 2])\n",
"plt.ylim([-2, 2])\n",
"plt.title(\"Blue points are False\")\n",
"plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Building a simple Tensorflow model\n",
"\n",
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
"def a_simple_NN():\n",
"\n",
" model = Sequential()\n",
"\n",
" model.add(Dense(4, input_shape=(2,), activation=\"relu\"))\n",
"\n",
" model.add(Dense(4, activation=\"relu\"))\n",
"\n",
" model.add(Dense(1, activation=\"sigmoid\"))\n",
"\n",
" model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
"\n",
" return model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Instantiating the model\n",
"model = a_simple_NN()\n",
"\n",
"# Splitting the dataset into training (70%) and validation sets (30%)\n",
"X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3)\n",
"\n",
"# Setting the number of passes through the entire training set\n",
"num_epochs = 300\n",
"\n",
"# model.fit() is used to train the model\n",
"# We can pass validation data while training\n",
"model_run = model.fit(\n",
" X_train, y_train, epochs=num_epochs, validation_data=(X_test, y_test)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i> \n",
" NOTE: We can pass \"verbose=0\" to model.fit() to suppress the printing of model output on the terminal/notebook.\n",
"</p></div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plotting the loss and accuracy on the training and validation sets during the training\n",
"# This can be done by using TensorFlow (Keras) callback \"history\" which is applied by default\n",
"history_model = model_run.history\n",
"\n",
"print(\"The history has the following data: \", history_model.keys())\n",
"\n",
"# Plotting the training and validation accuracy during the training\n",
"sns.lineplot(\n",
" model_run.epoch, history_model[\"accuracy\"], color=\"blue\", label=\"Training set\"\n",
")\n",
"sns.lineplot(\n",
" model_run.epoch, history_model[\"val_accuracy\"], color=\"red\", label=\"Valdation set\"\n",
")\n",
"plt.xlabel(\"epochs\")\n",
"plt.ylabel(\"accuracy\");"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"The plots such as above are essential for analyzing the behaviour and performance of the network and to tune it in the right direction. However, for the example above we don't expect to derive a lot of insight from this plot as the function we are trying to fit is quite simple and there is not too much noise. We will see the significance of these curves in a later example.\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Before we move on forward we see how to save and load a TensorFlow (keras) model\n",
"model.save(\"./data/my_first_NN.h5\")\n",
"model.save(\"./data/my_first_NN\")\n",
"\n",
"\n",
"# Optional: See what is in the hdf5 file we just created above\n",
"\n",
"from tensorflow.keras.models import load_model\n",
"\n",
"model = load_model(\"./data/my_first_NN.h5\")\n",
"model_pb = load_model(\"./data/my_first_NN\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the training and validation in the example above we split our dataset into a 70-30 train-validation set. We know from previous chapters that to more robustly estimate the accuracy of our model we can use **K-fold cross-validation**.\n",
"This is even more important when we have small datasets and cannot afford to reserve a validation set!\n",
"\n",
"One way to do the cross-validation here would be to write our own function to do this. However, we also know that **scikit-learn** provides several handy functions to evaluate and tune the models. So the question is:\n",
"\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
" Can we somehow use the scikit-learn functions or the ones we wrote ourselves for scikit-learn models to evaluate and tune our TensorFlow (Keras) models?\n",
"\n",
"\n",
"The Answer is **YES !**\n",
"</p>\n",
"</div>\n",
"\n",
"\n",
"\n",
"We show how to do this in the following section."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using scikit-learn functions on TensorFlow (Keras) models\n",
"\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"TensorFlow (Keras) offers 2 wrappers which allow its Sequential models to be used with scikit-learn. \n",
"\n",
"There are: **KerasClassifier** and **KerasRegressor**.\n",
"\n",
"For more information:\n",
"https://keras.io/scikit-learn-api/\n",
"</p>\n",
"</div>\n",
"\n",
"\n",
"\n",
"**Now lets see how this works!**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We wrap the TensorFlow (Keras) model we created above with KerasClassifier\n",
"from sklearn.model_selection import cross_val_score\n",
"from tensorflow.keras.wrappers.scikit_learn import KerasClassifier\n",
"\n",
"# Wrapping TensorFlow (Keras) model\n",
"# NOTE: We pass verbose=0 to suppress the model output\n",
"num_epochs = 400\n",
"model_scikit = KerasClassifier(build_fn=a_simple_NN, epochs=num_epochs, verbose=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Let's reuse the function to visualize the decision boundary which we saw in chapter 2 with minimal change\n",
"\n",
"\n",
"def list_flatten(list_of_list):\n",
" flattened_list = [i for j in list_of_list for i in j]\n",
" return flattened_list\n",
"\n",
"\n",
"def plot_points(plt=plt, marker=\"o\"):\n",
" colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
" plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker=marker)\n",
"\n",
"\n",
"def train_and_plot_decision_surface(\n",
" name, classifier, features_2d, labels, preproc=None, plt=plt, marker=\"o\", N=400\n",
"):\n",
"\n",
" features_2d = np.array(features_2d)\n",
" xmin, ymin = features_2d.min(axis=0)\n",
" xmax, ymax = features_2d.max(axis=0)\n",
"\n",
" x = np.linspace(xmin, xmax, N)\n",
" y = np.linspace(ymin, ymax, N)\n",
" points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n",
"\n",
" if preproc is not None:\n",
" points_for_classifier = preproc.fit_transform(points)\n",
" features_2d = preproc.fit_transform(features_2d)\n",
" else:\n",
" points_for_classifier = points\n",
"\n",
" classifier.fit(features_2d, labels, verbose=0)\n",
"\n",
" if name == \"Neural Net\":\n",
" # predicted = classifier.predict(features_2d)\n",
" # predicted = list_flatten(predicted)\n",
" predicted = list_flatten(\n",
" (classifier.predict(features_2d) > 0.5).astype(\"int32\")\n",
" )\n",
" # else:\n",
" # predicted = classifier.predict(features_2d)\n",
"\n",
" if preproc is not None:\n",
" name += \" (w/ preprocessing)\"\n",
" print(name + \":\\t\", sum(predicted == labels), \"/\", len(labels), \"correct\")\n",
"\n",
" if name == \"Neural Net\":\n",
" # classes = np.array(list_flatten(classifier.predict(points_for_classifier)), dtype=bool)\n",
" classes = np.array(\n",
" list_flatten(\n",
" (classifier.predict(points_for_classifier) > 0.5).astype(\"int32\")\n",
" ),\n",
" dtype=bool,\n",
" )\n",
" # else:\n",
" # classes = np.array(classifier.predict(points_for_classifier), dtype=bool)\n",
" plt.plot(\n",
" points[~classes][:, 0],\n",
" points[~classes][:, 1],\n",
" \"o\",\n",
" color=\"steelblue\",\n",
" markersize=1,\n",
" alpha=0.01,\n",
" )\n",
" plt.plot(\n",
" points[classes][:, 0],\n",
" points[classes][:, 1],\n",
" \"o\",\n",
" color=\"chocolate\",\n",
" markersize=1,\n",
" alpha=0.04,\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"_, ax = plt.subplots(figsize=(6, 6))\n",
"\n",
"train_and_plot_decision_surface(\"Neural Net\", model_scikit, features, labels, plt=ax)\n",
"plot_points(plt=ax)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Applying K-fold cross-validation\n",
"# Here we pass the whole dataset, i.e. features and labels, instead of splitting it.\n",
"num_folds = 5\n",
"cross_validation = cross_val_score(\n",
" model_scikit, features, labels, cv=num_folds, verbose=0\n",
")\n",
"\n",
"print(\"The acuracy on the \", num_folds, \" validation folds:\", cross_validation)\n",
"print(\n",
" \"The Average acuracy on the \",\n",
" num_folds,\n",
" \" validation folds:\",\n",
" np.mean(cross_validation),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"The code above took quite long to finish even though we used only 5 CV folds and the neural network and data size are very small! This gives an indication of the enormous compute requirements of training production-grade deep neural networks.\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameter optimization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know from chapter 6 that there are 2 types of parameters which need to be tuned for a machine learning model.\n",
"* Internal model parameters (weights) which can be learned for e.g. by gradient-descent\n",
"* Hyperparameters\n",
"\n",
"In the model created above we made some arbitrary choices such as the choice of the optimizer we used, optimizer's learning rate, number of hidden units and so on ...\n",
"\n",
"Now that we have the TensorFlow (keras) model wrapped as a scikit-learn model we can use the grid search functions we have seen in chapter 6."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import GridSearchCV\n",
"\n",
"# Just to remember\n",
"model_scikit = KerasClassifier(\n",
" build_fn=a_simple_NN, **{\"epochs\": num_epochs, \"verbose\": 0}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"HP_grid = {\"epochs\": [30, 50, 100]}\n",
"search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
"search.fit(features, labels)\n",
"print(search.best_score_, search.best_params_)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"HP_grid = {\"epochs\": [10, 15, 30], \"batch_size\": [10, 20, 30]}\n",
"search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
"search.fit(features, labels)\n",
"print(search.best_score_, search.best_params_)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# A more general model for further Hyperparameter optimization\n",
"from tensorflow.keras import optimizers\n",
"\n",
"def a_simple_NN(activation=\"relu\", num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
"\n",
" model = Sequential()\n",
"\n",
" model.add(Dense(num_hidden_neurons[0], input_shape=(2,), activation=activation))\n",
"\n",
" model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
"\n",
" model.add(Dense(1, activation=\"sigmoid\"))\n",
"\n",
" model.compile(\n",
" loss=\"binary_crossentropy\",\n",
" optimizer=optimizers.RMSprop(learning_rate=learning_rate),\n",
" metrics=[\"accuracy\"],\n",
" )\n",
"\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
"### keras-tuner"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -q -U keras-tuner"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def a_simple_NN(activation=\"relu\", num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
"\n",
" model = Sequential()\n",
"\n",
" model.add(Dense(num_hidden_neurons[0], input_shape=(2,), activation=activation))\n",
"\n",
" model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
"\n",
" model.add(Dense(1, activation=\"sigmoid\"))\n",
"\n",
" model.compile(\n",
" loss=\"binary_crossentropy\",\n",
" optimizer=optimizers.RMSprop(learning_rate=learning_rate),\n",
" metrics=[\"accuracy\"],\n",
" )\n",
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import keras_tuner as kt\n",
"\n",
"def model_builder(hp):\n",
"\n",
" # Tune the number of units in the first Dense layer\n",
" hp_units = hp.Int(\"units\", min_value=4, max_value=8, step=2)\n",
" hp_units_2 = hp.Int(\"units2\", min_value=4, max_value=16, step=2)\n",
"\n",
" # Tune the learning rate for the optimizer\n",
" hp_learning_rate = hp.Choice(\"learning_rate\", values=[1e-2, 1e-3, 1e-4])\n",
" # Tune the choice of the activation function\n",
" activation = hp.Choice(name=\"activation\", values=[\"relu\", \"sigmoid\"])\n",
"\n",
" model = a_simple_NN(activation, [hp_units, hp_units_2], hp_learning_rate)\n",
"\n",
"# The argument ‘hp’ is an instance of the class HyperParameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tuner = kt.BayesianOptimization(\n",
" model_builder,\n",
" objective=\"val_accuracy\",\n",
" max_trials=10,\n",
" project_name=\"intro_to_kt\",\n",
" overwrite=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tuner.search(X_train, y_train, epochs=100, validation_data=(X_test, y_test))\n",
"best_model = tuner.get_best_models()[0]\n",
"print(tuner.get_best_hyperparameters()[0].values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise section: \n",
"1. Create a neural network to classify the 2d points example from chapter 2 (Optional: As you create the model read a bit on the different TensorFlow (keras) commands we have used)\n",
"2. Plot the decision boundary\n",
"3. Choose and optimize a couple of hyperparameters\n",
"4. **OPTIONAL:** What function from scikit-learn other than GridSearchCV can we use for hyperparameter optimization? Use it (or use the equivalent method from keras-tuner)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split\n",
"from tensorflow.keras import optimizers\n",
"from tensorflow.keras.layers import Dense\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.wrappers.scikit_learn import KerasClassifier\n",
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"circle = pd.read_csv(\"data/circle.csv\")\n",
"# Using x and y coordinates as featues\n",
"features = circle.iloc[:, :-1]\n",
"# Convert boolean to integer values (True->1 and False->0)\n",
"labels = circle.iloc[:, -1].astype(int)\n",
"\n",
"colors = [[\"steelblue\", \"chocolate\"][i] for i in circle[\"label\"]]\n",
"plt.figure(figsize=(5, 5))\n",
"plt.xlim([-2, 2])\n",
"plt.ylim([-2, 2])\n",
"\n",
"plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"# Solution\n",
"def circle_NN(activation=\"relu\", learning_rate=0.01):\n",
"\n",
" model = Sequential()\n",
"\n",
" model.add(Dense(4, input_shape=(2,), activation=activation))\n",
" model.add(Dense(4, activation=activation))\n",
"\n",
" model.add(Dense(1, activation=\"sigmoid\"))\n",
"\n",
" model.compile(\n",
" loss=\"binary_crossentropy\",\n",
" optimizer=optimizers.RMSprop(learning_rate=learning_rate),\n",
" metrics=[\"accuracy\"],\n",
" )\n",
"\n",
" return model\n",
"\n",
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
"# Instantiating the model\n",
"model = circle_NN()\n",
"\n",
"# Splitting the dataset into training (70%) and validation sets (30%)\n",
"X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3)\n",
"\n",
"# Setting the number of passes through the entire training set\n",
"num_epochs = 400\n",
"\n",
"# model.fit() is used to train the model\n",
"# We can pass validation data while training\n",
"model_run = model.fit(\n",
" X_train, y_train, epochs=num_epochs, validation_data=(X_test, y_test), verbose=0\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"# solution\n",
"_, ax = plt.subplots(figsize=(6, 6))\n",
"\n",
"num_epochs = 400\n",
"circle_scikit = KerasClassifier(build_fn=circle_NN, epochs=num_epochs, verbose=0)\n",
"\n",
"train_and_plot_decision_surface(\"Neural Net\", circle_scikit, features, labels, plt=ax)\n",
"execution_count": null,
"metadata": {
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"# solution (older method)\n",
"\"\"\"\n",
"HP_grid = {\n",
" \"activation\": [\"relu\", \"sigmoid\"],\n",
" \"learning_rate\": [0.01, 0.005, 0.001],\n",
"}\n",
"search = GridSearchCV(estimator=circle_scikit, param_grid=HP_grid)\n",
"search.fit(features, labels)\n",
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
"print(search.best_score_, search.best_params_)\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"def model_builder(hp):\n",
" # Tune the learning rate for the optimizer\n",
" hp_learning_rate = hp.Choice(\"learning_rate\", values=[1e-2, 1e-3, 1e-4])\n",
" # Tune the choice of the activation function\n",
" activation = hp.Choice(name=\"activation\", values=[\"relu\", \"sigmoid\"])\n",
"\n",
" model = circle_NN(activation, hp_learning_rate)\n",
"\n",
" return model\n",
"\n",
"\n",
"tuner = kt.BayesianOptimization(\n",
" model_builder,\n",
" objective=\"val_accuracy\",\n",
" max_trials=10,\n",
" project_name=\"circle_exercise\",\n",
" overwrite=True,\n",
")\n",
"tuner.search(X_train, y_train, epochs=400, validation_data=(X_test, y_test))\n",
"best_model = tuner.get_best_models()[0]\n",
"print(tuner.get_best_hyperparameters()[0].values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"Another library which you should definitely look at for doing hyperparameter optimization with keras models is the <a href=\"https://github.com/maxpumperla/hyperas\">Hyperas library</a> which is a wrapper around the <a href=\"https://github.com/hyperopt/hyperopt\">Hyperopt library</a>. \n",
"\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The examples we saw above are really nice to show various features of the TensorFlow (Keras) library and to understand how we build and train a model. However, they are not the ideal problems one should solve using neural networks. They are too simple and can be solved easily by classical machine learning algorithms. \n",
"Now we show examples where Neural Networks really shine over classical machine learning algorithms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Handwritten Digits Classification (multi-class classification)\n",
"**MNIST Dataset**\n",
"MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
"> The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a > test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.\n",
"> It is a good database for people who want to try learning techniques and pattern recognition methods on real-world \n",
"> data while spending minimal efforts on preprocessing and formatting.\n",
"> source: http://yann.lecun.com/exdb/mnist/\n",
"\n",
"This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. \n",
"This is a multi-class classification problem. Unlike binary classification which we have seen till now we will classify data into 10 different classes."
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"# Loading the dataset in TensorFlow (keras)\n",
"# Later you can explore and play with other datasets with come with TensorFlow (Keras)\n",
"from tensorflow.keras.datasets import mnist\n",
"# Loading the train and test data\n",
"(X_train, y_train), (X_test, y_test) = mnist.load_data()"
"metadata": {},
"outputs": [],
"source": [
"# Looking at the dataset\n",
"print(X_train.shape)"
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
"# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
"i = np.random.randint(0, X_train.shape[0])\n",
"sns.set_style(\"white\")\n",
"plt.imshow(X_train[i], cmap=\"gray_r\")\n",
"sns.set(style=\"darkgrid\")\n",
"print(\"This digit is: \", y_train[i])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Look at the data values for a couple of images\n",
"print(X_train[0].min(), X_train[1].max())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data consists of values between 0-255 representing the **grayscale level**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The labels are the digit on the image\n",
"print(y_train.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Scaling the data\n",
"# It is important to normalize the input data to (0-1) before providing it to a neural net\n",
"# We could use the previously introduced function from scikit-learn. However, here it is sufficient to\n",
"# just divide the input data by 255\n",
"X_train_norm = X_train / 255.0\n",
"X_test_norm = X_test / 255.0\n",
"\n",
"# Also we need to reshape the input data such that each sample is a vector and not a 2D matrix\n",
"X_train_prep = X_train_norm.reshape(X_train_norm.shape[0], 28 * 28)\n",
"X_test_prep = X_test_norm.reshape(X_test_norm.shape[0], 28 * 28)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"One-Hot encoding\n",
"\n",
"In multi-class classification problems the labels are provided to the neural network as something called **One-hot encodings**. The categorical labels (0-9 here) are converted to vectors.\n",
"\n",
"For the MNIST problem where the data has **10 categories** we will convert every label to a vector of length 10. \n",
"All the entries of this vector will be zero **except** for the index which is equal to the (integer) value of the label.\n",
"\n",
"For example:\n",
"if label is 4. The one-hot vector will look like **[0 0 0 0 1 0 0 0 0 0]**\n",
"\n",
"Fortunately, TensorFlow (Keras) has a built-in function to achieve this and we do not have to write a code for this ourselves.\n",
"</p>\n",
"</div>"
"metadata": {},
"outputs": [],
"source": [
"from tensorflow.keras import utils\n",
"\n",
"y_train_onehot = utils.to_categorical(y_train, num_classes=10)\n",
"y_test_onehot = utils.to_categorical(y_test, num_classes=10)\n",
"\n",
"print(y_train_onehot.shape)"
"metadata": {},
"outputs": [],
"source": [
"# Building the tensorflow model\n",
"from tensorflow.keras.layers import Dense\n",
"from tensorflow.keras.models import Sequential\n",