Newer
Older
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
"from numpy.random import seed\n",
"seed(42)\n",
"import tensorflow as tf\n",
"tf.random.set_seed(36)\n",
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
"import matplotlib.pyplot as plt\n",
"import matplotlib as mpl\n",
"import seaborn as sns\n",
"sns.set(style=\"darkgrid\")\n",
"mpl.rcParams['lines.linewidth'] = 3\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"%config IPCompleter.greedy=True\n",
"import warnings\n",
"warnings.filterwarnings('ignore', category=FutureWarning)\n",
"from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 8: Introduction to Neural Networks\n",
"\n",
"\n",
"\n",
"<img src=\"./images/3042en.jpg\" title=\"made at imgflip.com\" width=35%/>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## History of Neural networks\n",
"\n",
"\n",
"1943 - Threshold Logic\n",
"\n",
"1940s - Hebbian Learning\n",
"\n",
"1958 - Perceptron\n",
"\n",
"1980s - Neocognitron\n",
"\n",
"1982 - Hopfield Network\n",
"\n",
"1989 - Convolutional neural network (CNN) kernels trained via backpropagation\n",
"\n",
"1997 - Long-short term memory (LSTM) model\n",
"\n",
"1998 - LeNet-5\n",
"\n",
"2014 - Gated Recurrent Units (GRU), Generative Adversarial Networks (GAN)\n",
"\n",
"2015 - ResNet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Why the boom now?\n",
"* Data\n",
"* Data\n",
"* Data\n",
"* Availability of GPUs\n",
"* Algorithmic developments which allow for efficient training and making networks networks\n",
"* Development of high-level libraries/APIs have made the field much more accessible than it was a decade ago"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feed-Forward neural network\n",
"<center>\n",
"<figure>\n",
"<img src=\"./images/neuralnets/neural_net_ex.svg\" width=\"700\"/>\n",
"<figcaption>A 3 layer densely connected Neural Network (By convention the input layer is not counted).</figcaption>\n",
"</figure>\n",
"</center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building blocks\n",
"### Perceptron\n",
"\n",
"The smallest unit of a neural network is a **perceptron** like node.\n",
"\n",
"**What is a Perceptron?**\n",
"\n",
"It is a simple function which can have multiple inputs and has a single output.\n",
"\n",
"<center>\n",
"<figure>\n",
"<img src=\"./images/neuralnets/perceptron_ex.svg\" width=\"400\"/>\n",
"<figcaption>A simple perceptron with 3 inputs and 1 output.</figcaption>\n",
"</figure>\n",
"</center>\n",
"\n",
"\n",
"It works as follows: \n",
"\n",
"Step 1: A **weighted sum** of the inputs is calculated\n",
"\n",
"\\begin{equation*}\n",
"weighted\\_sum = w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3} + ...\n",
"\\end{equation*}\n",
"\n",
"Step 2: A **step** activation function is applied\n",
"\n",
"$$\n",
"f = \\left\\{\n",
" \\begin{array}{ll}\n",
" 0 & \\quad weighted\\_sum < threshold \\\\\n",
" 1 & \\quad weighted\\_sum \\geq threshold\n",
" \\end{array}\n",
" \\right.\n",
"$$\n",
"\n",
"You can see that this is also a linear classifier as the ones we introduced in script 02."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"source": [
"# Plotting the step function\n",
"x = np.arange(-2,2.1,0.01)\n",
"y = np.zeros(len(x))\n",
"threshold = 0.\n",
"y[x>threshold] = 1.\n",
"step_plot = sns.lineplot(x, y).set_title('Step function') ;\n",
"plt.xlabel('weighted_sum') ;\n",
"plt.ylabel('f(weighted_sum)') ;"
]
},
{
"cell_type": "code",
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
"metadata": {},
"outputs": [],
"source": [
"def perceptron(X, w, threshold=1):\n",
" # This function computes sum(w_i*x_i) and\n",
" # applies a perceptron activation\n",
" linear_sum = np.dot(np.asarray(X).T, w)\n",
" output = np.zeros(len(linear_sum), dtype=np.int8)\n",
" output[linear_sum >= threshold] = 1\n",
" return output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Boolean AND\n",
"\n",
"| x$_1$ | x$_2$ | output |\n",
"| --- | --- | --- |\n",
"| 0 | 0 | 0 |\n",
"| 1 | 0 | 0 |\n",
"| 0 | 1 | 0 |\n",
"| 1 | 1 | 1 |"
]
},
{
"cell_type": "code",
"# (x1, x2) pairs\n",
"x1 = [0, 1, 0, 1]\n",
"x2 = [0, 0, 1, 1]\n",
"# Calling the perceptron function\n",
"output = perceptron([x1, x2], w, threshold)\n",
"for i in range(len(output)):\n",
" print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n",
" \" is \", output[i])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:"
]
},
{
"cell_type": "code",
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
"metadata": {},
"outputs": [],
"source": [
"def perceptron_DB(x1, x2, w, threshold):\n",
" # Plotting the decision boundary of the perceptron\n",
" plt.scatter(x1, x2, color=\"black\")\n",
" plt.xlim(-1,2)\n",
" plt.ylim(-1,2)\n",
" # The decision boundary is a line given by\n",
" # w_1*x_1+w_2*x_2-threshold=0\n",
" x1 = np.arange(-3, 4)\n",
" x2 = (threshold - x1*w[0])/w[1]\n",
" sns.lineplot(x1, x2, **{\"color\": \"black\"})\n",
" plt.xlabel(\"x$_1$\", fontsize=16)\n",
" plt.ylabel(\"x$_2$\", fontsize=16)\n",
" # Coloring the regions\n",
" pts_tmp = np.arange(-2, 2.1, 0.02)\n",
" points = np.array(np.meshgrid(pts_tmp, pts_tmp)).T.reshape(-1, 2)\n",
" outputs = perceptron(points.T, w, threshold)\n",
" plt.plot(points[:, 0][outputs == 0], points[:, 1][outputs == 0],\n",
" \"o\",\n",
" color=\"steelblue\",\n",
" markersize=1,\n",
" alpha=0.04,\n",
" )\n",
" plt.plot(points[:, 0][outputs == 1], points[:, 1][outputs == 1],\n",
" \"o\",\n",
" color=\"chocolate\",\n",
" markersize=1,\n",
" alpha=0.04,\n",
" )\n",
" plt.title(\"Blue color = 0 and Chocolate = 1\")"
]
},
{
"cell_type": "code",
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
"source": [
"# Plotting the perceptron decision boundary\n",
"perceptron_DB(x1, x2, w, threshold)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise section\n",
"* Compute a Boolean \"OR\" using a perceptron\n",
"\n",
"Hint: copy the code from the \"AND\" example and edit the weights and/or threshold"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Boolean OR\n",
"\n",
"| x$_1$ | x$_2$ | output |\n",
"| --- | --- | --- |\n",
"| 0 | 0 | 0 |\n",
"| 1 | 0 | 1 |\n",
"| 0 | 1 | 1 |\n",
"| 1 | 1 | 1 |"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Calculating Boolean OR using a perceptron\n",
"# Enter code here"
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"tags": [
"solution"
]
},
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
"source": [
"# Solution\n",
"# Calculating Boolean OR using a perceptron\n",
"threshold=0.6\n",
"# (w1, w2)\n",
"w=[1,1]\n",
"# (x1, x2) pairs\n",
"x1 = [0, 1, 0, 1]\n",
"x2 = [0, 0, 1, 1]\n",
"output = perceptron([x1, x2], w, threshold)\n",
"for i in range(len(output)):\n",
" print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n",
" \" is \", output[i])\n",
"perceptron_DB(x1, x2, w, threshold)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise section\n",
"* Create a NAND gate using a perceptron\n",
"\n",
"Boolean NAND\n",
"\n",
"| x$_1$ | x$_2$ | output |\n",
"| --- | --- | --- |\n",
"| 0 | 0 | 1 |\n",
"| 1 | 0 | 1 |\n",
"| 0 | 1 | 1 |\n",
"| 1 | 1 | 0 |"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Calculating Boolean NAND using a perceptron\n",
"# Enter code here"
]
},
{
"cell_type": "code",
"metadata": {
"tags": [
"solution"
]
},
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
"source": [
"# Solution\n",
"# Calculating Boolean NAND using a perceptron\n",
"import matplotlib.pyplot as plt\n",
"threshold=-1.5\n",
"# (w1, w2)\n",
"w=[-1,-1]\n",
"# (x1, x2) pairs\n",
"x1 = [0, 1, 0, 1]\n",
"x2 = [0, 0, 1, 1]\n",
"output = perceptron([x1, x2], w, threshold)\n",
"for i in range(len(output)):\n",
" print(\"Perceptron output for x1, x2 = \", x1[i], \",\", x2[i],\n",
" \" is \", output[i])\n",
"perceptron_DB(x1, x2, w, threshold)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In fact, a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n",
"\n",
"However, it cannot compute some other boolean functions such as \"XOR\".\n",
"\n",
"**WHAT CAN WE DO?**\n",
"\n",
"\n",
"Hint: Think about what is the significance of the NAND gate we have created above?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multi-layer perceptrons\n",
"\n",
"\n",
"Answer: We said a single perceptron can't compute a \"XOR\" function. We didn't say that about **multiple Perceptrons** put together.\n",
"\n",
"The normal densely connected neural network is sometimes also called \"Multi-layer\" perceptron.\n",
"\n",
"**XOR function using multiple perceptrons**\n",
"\n",
"<center>\n",
"<figure>\n",
"<img src=\"./images/neuralnets/perceptron_XOR.svg\" width=\"400\"/>\n",
"<figcaption>Multiple perceptrons connected together to output a XOR function.</figcaption>\n",
"</figure>\n",
"</center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning\n",
"\n",
"We know that we can compute complicated functions by combining a number of perceptrons.\n",
"\n",
"In the perceptron examples we had set the model parameters (weights and threshold) by hand.\n",
"\n",
"This is something we definitely **DO NOT** want to do or even can do for big networks.\n",
"\n",
"We want some algorithm to set/learn the model parameters for us!\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
" <i class=\"fa fa-info-circle\"></i> <strong>Threshold -> bias</strong> \n",
" \n",
"Before we go further we need to introduce one change. The threshold which we saw in the step activation function above is moved to the left side of the equation and is called **bias**.\n",
"\n",
"$$\n",
"f = \\left\\{\n",
" \\begin{array}{ll}\n",
" 0 & \\quad weighted\\_sum + bias < 0 \\\\\n",
" 1 & \\quad weighted\\_sum + bias \\geq 0\n",
" \\end{array}\n",
" \\quad \\quad \\mathrm{where}, bias = -threshold\n",
" \\right.\n",
"$$\n",
"\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to algorithmically set/learn the weights and bias we need to choose an appropriate loss function for the problem at hand and solve an optimization problem.\n",
"We will explain below what this means.\n",
"\n",
"\n",
"### Loss function\n",
"\n",
"To learn using an algorithm we need to define a quantity/function which allows us to measure how close or far are the predictions of our network/setup from reality or the supplied labels. This is done by choosing a so-called \"Loss function\" (as in the case for other machine learning algorithms).\n",
"\n",
"Once we have this function, we need an algorithm to update the weights of the network such that this loss function decreases. \n",
"As one can already imagine the choice of an appropriate loss function is critical to the success of the model. \n",
"\n",
"Fortunately, for classification and regression (which cover a large variety of problems) these loss functions are well known. \n",
"\n",
"**Crossentropy** and **mean squared error** loss functions are often used for standard classification and regression problems, respectively.\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
" <i class=\"fa fa-info-circle\"></i> As we have seen before, <strong>mean squared error</strong> is defined as \n",
"\n",
"\n",
"$$\n",
"\\frac{1}{n} \\left((y_1 - \\hat{y}_1)^2 + (y_2 - \\hat{y}_2)^2 + ... + (y_n - \\hat{y}_n)^2 \\right)\n",
"$$\n",
"\n",
"\n",
"</div>\n",
"\n",
"### Gradient based learning\n",
"\n",
"As mentioned above, once we have chosen a loss function, we want to solve an **optimization problem** which minimizes this loss by updating the parameters (weights and biases) of the network. This is how the learning takes in a NN, and the \"knowledge\" is stored as the weights and biases.\n",
"\n",
"The most popular optimization methods used in Neural Network training are **Gradient-descent (GD)** type methods, such as gradient-descent itself, RMSprop and Adam. \n",
"\n",
"**Gradient-descent** uses partial derivatives of the loss function with respect to the network weights and a learning rate to updates the weights such that the loss function decreases and after some iterations reaches its (Global) minimum value.\n",
"\n",
"First, the loss function and its derivative are computed at the output node, and this signal is propagated backwards, using the chain rule, in the network to compute the partial derivatives. Hence, this method is called **Backpropagation**.\n",
"\n",
"One way to perform a single GD pass is to compute the partial derivatives using **all the samples** in our data, computing average derivatives and using them to update the weights. This is called **Batch gradient descent**. However, in deep learning we mostly work with massive datasets and using batch gradient descent can make the training very slow!\n",
"\n",
"The other extreme is to randomly shuffle the dataset and advance a pass of GD with the gradients computed using only **one sample** at a time. This is called **Stochastic gradient descent**.\n",
"\n",
"<center>\n",
"<figure>\n",
"<img src=\"./images/stochastic-vs-batch-gradient-descent.png\" width=\"600\"/>\n",
"<figcaption>Source: <a href=\"https://wikidocs.net/3413\">https://wikidocs.net/3413</a></figcaption>\n",
"</figure>\n",
"</center>\n",
"\n",
"\n",
"In practice, an approach in-between these two is used. The entire dataset is divided into **m batches** and these are used one by one to compute the derivatives and apply GD. This technique is called **Mini-batch gradient descent**. \n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"One pass through the entire training dataset is called 1 epoch of training.\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "code",
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
"source": [
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import numpy as np\n",
"\n",
"plt.figure(figsize=(10, 4)) ;\n",
"\n",
"pts=np.arange(-20,20, 0.1) ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Activation Functions\n",
"\n",
"In order to train the network we need to move away from Perceptron's **step** activation function because it can not be used for training using the gradient-descent and back-propagation algorithms among other drawbacks.\n",
"\n",
"Non-Linear functions such as:\n",
"\n",
"* Sigmoid\n",
"\n",
"\\begin{equation*}\n",
"f(z) = \\frac{1}{1+e^{-z}} \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
"\\end{equation*}"
]
},
{
"cell_type": "code",
"source": [
"sns.lineplot(pts, 1/(1+np.exp(-pts))) ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* tanh\n",
"\n",
"\\begin{equation*}\n",
"f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
"\\end{equation*}\n"
]
},
{
"cell_type": "code",
"source": [
"sns.lineplot(pts, np.tanh(pts*np.pi)) ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* **ReLU (Rectified linear unit)**\n",
"\n",
"\\begin{equation*}\n",
"f(z) = \\mathrm{max}(0,z) \\quad \\quad \\mathrm{where}, z = weighted\\_sum + bias\n",
"\\end{equation*}"
]
},
{
"cell_type": "code",
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
"source": [
"pts_relu=[max(0,i) for i in pts];\n",
"plt.plot(pts, pts_relu) ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"are some of the commonly used as activation functions. Such non-linear activation functions allow the network to learn complex representations of data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"ReLU is very popular and is widely used nowadays. There also exist other variations of ReLU, e.g. \"leaky ReLU\".\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"Why don't we just use a simple linear activation function?\n",
" \n",
"Linear activations are **NOT** used because it can be mathematically shown that if they are used then the output is just a linear function of the input. So we cannot learn interesting and complex functions by adding any number of hidden layers.\n",
"\n",
"The only exception when we do want to use a linear activation is for the output layer of a network when solving a regression problem.\n",
"\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise section - Google Playground\n",
"\n",
"A great tool from Google to develop a feeling for the workings of neural networks.\n",
"\n",
"https://playground.tensorflow.org/\n",
"\n",
"<img src=\"./images/neuralnets/google_playground.png\"/>\n",
"\n",
"**Walkthrough by instructor**\n",
"\n",
"Some concepts to look at:\n",
"\n",
"* Simple vs Complex models (Effect of network size)\n",
"* Optimization results\n",
"* Effect of activation functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to Keras"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What is Keras?\n",
"\n",
"* It is a high level API to create and work with neural networks\n",
"* Supports multiple backends such as **TensorFlow** from Google, **Theano** (Although Theano is dead now) and **CNTK** (Microsoft Cognitive Toolkit)\n",
"* Very good for creating neural nets quickly and hides away a lot of tedious work\n",
"* Has been incorporated into official TensorFlow (which obviously only works with tensforflow) and as of TensorFlow 2.0 this will the main api to use it\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
"<figure>\n",
"<img src=\"./images/neuralnets/neural_net_keras_1.svg\" width=\"700\"/>\n",
"<figcaption>Building this model in Keras</figcaption>\n",
"</figure>\n",
"</center>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"# Say hello to Tensorflow\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense, Activation\n",
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
"\n",
"# Creating a model\n",
"model = Sequential()\n",
"\n",
"# Adding layers to this model\n",
"# 1st Hidden layer\n",
"# A Dense/fully-connected layer which takes as input a \n",
"# feature array of shape (samples, num_features)\n",
"# Here input_shape = (2,) means that the layer expects an input with num_features = 2\n",
"# and the sample size could be anything\n",
"# The activation function for this layer is set to \"relu\"\n",
"model.add(Dense(units=4, input_shape=(2,), activation=\"relu\"))\n",
"\n",
"# 2nd Hidden layer\n",
"# This is also a fully-connected layer and we do not need to specify the\n",
"# shape of the input anymore (We need to do that only for the first layer)\n",
"# NOTE: Now we didn't add the activation seperately. Instead we just added it\n",
"# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
"model.add(Dense(units=4, activation=\"relu\"))\n",
"\n",
" \n",
"# The output layer\n",
"model.add(Dense(units=1))\n",
"model.add(Activation(\"sigmoid\"))\n",
"\n",
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### XOR using neural networks"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.model_selection import train_test_split\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"source": [
"# Creating a network to solve the XOR problem\n",
"\n",
"# Loading and plotting the data\n",
"xor = pd.read_csv(\"data/xor.csv\")\n",
"\n",
"# Using x and y coordinates as featues\n",
"features = xor.iloc[:, :-1]\n",
"# Convert boolean to integer values (True->1 and False->0)\n",
"labels = (1-xor.iloc[:, -1].astype(int))\n",
"\n",
"colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
"plt.figure(figsize=(5, 5))\n",
"plt.xlim([-2, 2])\n",
"plt.ylim([-2, 2])\n",
"plt.title(\"Blue points are False\")\n",
"plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\") ;"
]
},
{
"cell_type": "code",
"\n",
"def a_simple_NN():\n",
" \n",
" model = Sequential()\n",
"\n",
" model.add(Dense(4, input_shape = (2,), activation = \"relu\"))\n",
"\n",
" model.add(Dense(4, activation = \"relu\"))\n",
"\n",
" model.add(Dense(1, activation = \"sigmoid\"))\n",
"\n",
" model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
" \n",
" return model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
"source": [
"# Instantiating the model\n",
"model = a_simple_NN()\n",
"\n",
"# Splitting the dataset into training (70%) and validation sets (30%)\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" features, labels, test_size=0.3)\n",
"\n",
"# Setting the number of passes through the entire training set\n",
"num_epochs = 300\n",
"\n",
"# model.fit() is used to train the model\n",
"# We can pass validation data while training\n",
"model_run = model.fit(X_train, y_train, epochs=num_epochs,\n",
" validation_data=(X_test, y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i> \n",
" NOTE: We can pass \"verbose=0\" to model.fit() to suppress the printing of model output on the terminal/notebook.\n",
"</p></div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plotting the loss and accuracy on the training and validation sets during the training\n",
"# This can be done by using Keras callback \"history\" which is applied by default\n",
"history_model = model_run.history\n",
"\n",
"print(\"The history has the following data: \", history_model.keys())\n",
"\n",
"# Plotting the training and validation accuracy during the training\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"accuracy\"], color = \"blue\", label=\"Training set\") ;\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"val_accuracy\"], color = \"red\", label=\"Valdation set\") ;\n",
"plt.xlabel(\"epochs\") ;\n",
"plt.ylabel(\"accuracy\") ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"The plots such as above are essential for analyzing the behaviour and performance of the network and to tune it in the right direction. However, for the example above we don't expect to derive a lot of insight from this plot as the function we are trying to fit is quite simple and there is not too much noise. We will see the significance of these curves in a later example.\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Before we move on forward we see how to save and load a keras model\n",
"model.save(\"./data/my_first_NN.h5\")\n",
"\n",
"# Optional: See what is in the hdf5 file we just created above\n",
"\n",
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
"model = load_model(\"./data/my_first_NN.h5\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the training and validation in the example above we split our dataset into a 70-30 train-validation set. We know from previous chapters that to more robustly estimate the accuracy of our model we can use **K-fold cross-validation**.\n",
"This is even more important when we have small datasets and cannot afford to reserve a validation set!\n",
"\n",
"One way to do the cross-validation here would be to write our own function to do this. However, we also know that **scikit-learn** provides several handy functions to evaluate and tune the models. So the question is:\n",
"\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
" Can we somehow use the scikit-learn functions or the ones we wrote ourselves for scikit-learn models to evaluate and tune our Keras models?\n",
"\n",
"\n",
"The Answer is **YES !**\n",
"</p>\n",
"</div>\n",
"\n",
"\n",
"\n",
"We show how to do this in the following section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using scikit-learn functions on keras models\n",
"\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"Keras offers 2 wrappers which allow its Sequential models to be used with scikit-learn. \n",
"\n",
"There are: **KerasClassifier** and **KerasRegressor**.\n",
"\n",
"For more information:\n",
"https://keras.io/scikit-learn-api/\n",
"</p>\n",
"</div>\n",
"\n",
"\n",
"\n",
"**Now lets see how this works!**"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# We wrap the Keras model we created above with KerasClassifier\n",
"from tensorflow.keras.wrappers.scikit_learn import KerasClassifier\n",
"from sklearn.model_selection import cross_val_score\n",
"# Wrapping Keras model\n",
"# NOTE: We pass verbose=0 to suppress the model output\n",
"num_epochs = 400\n",
"model_scikit = KerasClassifier(\n",
" build_fn=a_simple_NN, epochs=num_epochs, verbose=0)"
]
},
{
"cell_type": "code",
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
"metadata": {},
"outputs": [],
"source": [
"# Let's reuse the function to visualize the decision boundary which we saw in chapter 2 with minimal change\n",
"\n",
"def list_flatten(list_of_list):\n",
" flattened_list = [i for j in list_of_list for i in j]\n",
" return flattened_list\n",
"\n",
"def plot_points(plt=plt, marker='o'):\n",
" colors = [[\"steelblue\", \"chocolate\"][i] for i in labels]\n",
" plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker=marker);\n",
"\n",
"def train_and_plot_decision_surface(\n",
" name, classifier, features_2d, labels, preproc=None, plt=plt, marker='o', N=400\n",
"):\n",
"\n",
" features_2d = np.array(features_2d)\n",
" xmin, ymin = features_2d.min(axis=0)\n",
" xmax, ymax = features_2d.max(axis=0)\n",
"\n",
" x = np.linspace(xmin, xmax, N)\n",
" y = np.linspace(ymin, ymax, N)\n",
" points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n",
"\n",
" if preproc is not None:\n",
" points_for_classifier = preproc.fit_transform(points)\n",
" features_2d = preproc.fit_transform(features_2d)\n",
" else:\n",
" points_for_classifier = points\n",
"\n",
" classifier.fit(features_2d, labels, verbose=0)\n",
" predicted = classifier.predict(features_2d)\n",
" \n",
" if name == \"Neural Net\":\n",
" predicted = list_flatten(predicted)\n",
" \n",
" \n",
" if preproc is not None:\n",
" name += \" (w/ preprocessing)\"\n",
" print(name + \":\\t\", sum(predicted == labels), \"/\", len(labels), \"correct\")\n",
" \n",
" if name == \"Neural Net\":\n",
" classes = np.array(list_flatten(classifier.predict(points_for_classifier)), dtype=bool)\n",
" else:\n",
" classes = np.array(classifier.predict(points_for_classifier), dtype=bool)\n",
" plt.plot(\n",
" points[~classes][:, 0],\n",
" points[~classes][:, 1],\n",
" \"o\",\n",
" color=\"steelblue\",\n",
" markersize=1,\n",
" alpha=0.01,\n",
" )\n",
" plt.plot(\n",
" points[classes][:, 0],\n",
" points[classes][:, 1],\n",
" \"o\",\n",
" color=\"chocolate\",\n",
" markersize=1,\n",
" alpha=0.04,\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"_, ax = plt.subplots(figsize=(6, 6))\n",
"\n",
"train_and_plot_decision_surface(\"Neural Net\", model_scikit, features, labels, plt=ax)\n",
"plot_points(plt=ax)"
]
},
{
"cell_type": "code",
"source": [
"# Applying K-fold cross-validation\n",
"# Here we pass the whole dataset, i.e. features and labels, instead of splitting it.\n",
"num_folds = 5\n",
"cross_validation = cross_val_score(\n",
" model_scikit, features, labels, cv=num_folds, verbose=0)\n",
"\n",
"print(\"The acuracy on the \", num_folds, \" validation folds:\", cross_validation)\n",
"print(\"The Average acuracy on the \", num_folds, \" validation folds:\", np.mean(cross_validation))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"The code above took quite long to finish even though we used only 5 CV folds and the neural network and data size are very small! This gives an indication of the enormous compute requirements of training production-grade deep neural networks.\n",
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
"</p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameter optimization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know from chapter 6 that there are 2 types of parameters which need to be tuned for a machine learning model.\n",
"* Internal model parameters (weights) which can be learned for e.g. by gradient-descent\n",
"* Hyperparameters\n",
"\n",
"In the model created above we made some arbitrary choices such as the choice of the optimizer we used, optimizer's learning rate, number of hidden units and so on ...\n",
"\n",
"Now that we have the keras model wrapped as a scikit-learn model we can use the grid search functions we have seen in chapter 6."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import GridSearchCV\n",
"# Just to remember\n",
"model_scikit = KerasClassifier(\n",
" build_fn=a_simple_NN, **{\"epochs\": num_epochs, \"verbose\": 0})"
]
},
{
"cell_type": "code",
"source": [
"HP_grid = {'epochs' : [30, 50, 100]}\n",
"search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
"search.fit(features, labels)\n",
"print(search.best_score_, search.best_params_)"
]
},
{
"cell_type": "code",
"source": [
"HP_grid = {'epochs' : [10, 15, 30], \n",
" 'batch_size' : [10, 20, 30] }\n",
"search = GridSearchCV(estimator=model_scikit, param_grid=HP_grid)\n",
"search.fit(features, labels)\n",
"print(search.best_score_, search.best_params_)"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# A more general model for further Hyperparameter optimization\n",
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
"\n",
"def a_simple_NN(activation='relu', num_hidden_neurons=[4, 4], learning_rate=0.01):\n",
"\n",
" model = Sequential()\n",
"\n",
" model.add(Dense(num_hidden_neurons[0],\n",
" input_shape=(2,), activation=activation))\n",
"\n",
" model.add(Dense(num_hidden_neurons[1], activation=activation))\n",
"\n",
" model.add(Dense(1, activation=\"sigmoid\"))\n",
"\n",
" model.compile(loss=\"binary_crossentropy\", optimizer=optimizers.rmsprop(\n",
" lr=learning_rate), metrics=[\"accuracy\"])\n",
"\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise section: \n",
"* Look at the model above and choose a couple of hyperparameters to optimize. \n",
"* **OPTIONAL:** What function from scikit-learn other than GridSearchCV can we use for hyperparameter optimization? Use it."
]
},
{
"cell_type": "code",
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
"metadata": {},
"outputs": [],
"source": [
"# Code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"Another library which you should definitely look at for doing hyperparameter optimization with keras models is the <a href=\"https://github.com/maxpumperla/hyperas\">Hyperas library</a> which is a wrapper around the <a href=\"https://github.com/hyperopt/hyperopt\">Hyperopt library</a>. \n",
"\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise section: \n",
"* Create a neural network to classify the 2d points example from chapter 2 learned (Optional: As you create the model read a bit on the different keras commands we have used)."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import numpy as np\n",
"from sklearn.model_selection import train_test_split, cross_val_score\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense\n",
"from tensorflow.keras import optimizers\n",
"from tensorflow.keras.wrappers.scikit_learn import KerasClassifier"
"source": [
"circle = pd.read_csv(\"data/circle.csv\")\n",
"# Using x and y coordinates as featues\n",
"features = circle.iloc[:, :-1]\n",
"# Convert boolean to integer values (True->1 and False->0)\n",
"labels = circle.iloc[:, -1].astype(int)\n",
"\n",
"colors = [[\"steelblue\", \"chocolate\"][i] for i in circle[\"label\"]]\n",
"plt.figure(figsize=(5, 5))\n",
"plt.xlim([-2, 2])\n",
"plt.ylim([-2, 2])\n",
"\n",
"plt.scatter(features[\"x\"], features[\"y\"], color=colors, marker=\"o\");\n"
]
},
{
"cell_type": "code",
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
"metadata": {},
"outputs": [],
"source": [
"# Insert Code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The examples we saw above are really nice to show various features of the Keras library and to understand how we build and train a model. However, they are not the ideal problems one should solve using neural networks. They are too simple and can be solved easily by classical machine learning algorithms. \n",
"\n",
"Now we show examples where Neural Networks really shine over classical machine learning algorithms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Handwritten Digits Classification (multi-class classification)\n",
"**MNIST Dataset**\n",
"\n",
"MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
"\n",
"\n",
">The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a >test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size->normalized and centered in a fixed-size image.\n",
">It is a good database for people who want to try learning techniques and pattern recognition methods on real-world >data while spending minimal efforts on preprocessing and formatting.\n",
">source: http://yann.lecun.com/exdb/mnist/\n",
"\n",
"This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. \n",
"This is a multi-class classification problem. Unlike binary classification which we have seen till now we will classify data into 10 different classes."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Loading the dataset in keras\n",
"# Later you can explore and play with other datasets with come with Keras\n",
"\n",
"# Loading the train and test data\n",
"\n",
"(X_train, y_train), (X_test, y_test) = mnist.load_data()"
]
},
{
"cell_type": "code",
"source": [
"# Looking at the dataset\n",
"print(X_train.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
"i=np.random.randint(0,X_train.shape[0])\n",
"sns.set_style(\"white\")\n",
"plt.imshow(X_train[i], cmap=\"gray_r\") ;\n",
"sns.set(style=\"darkgrid\")\n",
"print(\"This digit is: \" , y_train[i])"
]
},
{
"cell_type": "code",
"source": [
"# Look at the data values for a couple of images\n",
"print(X_train[0].min(), X_train[1].max())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data consists of values between 0-255 representing the **grayscale level**"
]
},
{
"cell_type": "code",
"source": [
"# The labels are the digit on the image\n",
"print(y_train.shape)"
]
},
{
"cell_type": "code",
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
"metadata": {},
"outputs": [],
"source": [
"# Scaling the data\n",
"# It is important to normalize the input data to (0-1) before providing it to a neural net\n",
"# We could use the previously introduced function from scikit-learn. However, here it is sufficient to\n",
"# just divide the input data by 255\n",
"X_train_norm = X_train/255.\n",
"X_test_norm = X_test/255.\n",
"\n",
"# Also we need to reshape the input data such that each sample is a vector and not a 2D matrix\n",
"X_train_prep = X_train_norm.reshape(X_train_norm.shape[0],28*28)\n",
"X_test_prep = X_test_norm.reshape(X_test_norm.shape[0],28*28)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"One-Hot encoding\n",
"\n",
"In multi-class classification problems the labels are provided to the neural network as something called **One-hot encodings**. The categorical labels (0-9 here) are converted to vectors.\n",
"\n",
"For the MNIST problem where the data has **10 categories** we will convert every label to a vector of length 10. \n",
"All the entries of this vector will be zero **except** for the index which is equal to the (integer) value of the label.\n",
"\n",
"For example:\n",
"if label is 4. The one-hot vector will look like **[0 0 0 0 1 0 0 0 0 0]**\n",
"\n",
"Fortunately, Keras has a built-in function to achieve this and we do not have to write a code for this ourselves.\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "code",
"y_train_onehot = utils.to_categorical(y_train, num_classes=10)\n",
"y_test_onehot = utils.to_categorical(y_test, num_classes=10)\n",
"\n",
"print(y_train_onehot.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"# Building the tensorflow model\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense\n",
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
"\n",
"def mnist_model():\n",
" model = Sequential()\n",
"\n",
" model.add(Dense(64, input_shape=(28*28,), activation=\"relu\"))\n",
"\n",
" model.add(Dense(64, activation=\"relu\"))\n",
"\n",
" model.add(Dense(10, activation=\"softmax\"))\n",
"\n",
" model.compile(loss=\"categorical_crossentropy\",\n",
" optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
" return model\n",
"\n",
"model = mnist_model()\n",
"\n",
"model_run = model.fit(X_train_prep, y_train_onehot, epochs=20,\n",
" batch_size=512)"
]
},
{
"cell_type": "code",
"source": [
"print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise section\n",
"* Reinitialize and run the model again with validation dataset, plot the accuracy as a function of epochs, play with number of epochs and observe what is happening."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Code here"
]
},
{
"cell_type": "code",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"# Solution:\n",
"num_epochs = 20\n",
"model = mnist_model()\n",
"model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n",
" batch_size=512, validation_data=(X_test_prep, y_test_onehot))\n",
"# Evaluating the model on test dataset\n",
"#print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))\n",
"history_model = model_run.history\n",
"print(\"The history has the following data: \", history_model.keys())\n",
"\n",
"# Plotting the training and validation accuracy during the training\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"accuracy\"], color = \"blue\", label=\"Training set\") ;\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"val_accuracy\"], color = \"red\", label=\"Valdation set\") ;\n",
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
"plt.xlabel(\"epochs\") ;\n",
"plt.ylabel(\"accuracy\") ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we see here is **overfitting**. After the first few epochs the training and validation datasets show a similar accuracy but thereafter the network starts to over fit to the training set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"Keep in mind that neural networks are quite prone to overfitting so always check for it.\n",
"</p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding regularization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Adding l2 regularization\n",
"# Building the keras model\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense\n",
"from tensorflow.keras.regularizers import l2\n",
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
"\n",
"def mnist_model():\n",
" \n",
" model = Sequential()\n",
"\n",
" model.add(Dense(64, input_shape=(28*28,), activation=\"relu\", \n",
" kernel_regularizer=l2(0.01)))\n",
"\n",
" model.add(Dense(64, activation=\"relu\", \n",
" kernel_regularizer=l2(0.01)))\n",
"\n",
" model.add(Dense(10, activation=\"softmax\"))\n",
"\n",
" model.compile(loss=\"categorical_crossentropy\",\n",
" optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
" return model\n",
"\n",
"model = mnist_model()\n",
"\n",
"num_epochs = 20\n",
"model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n",
" batch_size=512, validation_data=(X_test_prep, y_test_onehot))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Evaluating the model on test dataset\n",
"history_model = model_run.history\n",
"print(\"The history has the following data: \", history_model.keys())\n",
"\n",
"# Plotting the training and validation accuracy during the training\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"accuracy\"], color = \"blue\", label=\"Training set\") ;\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"val_accuracy\"], color = \"red\", label=\"Valdation set\") ;\n",
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
"plt.xlabel(\"epochs\") ;\n",
"plt.ylabel(\"accuracy\") ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p><i class=\"fa fa-warning\"></i> \n",
"Another way to add regularization and to make the network more robust is by applying Dropout. When we add dropout to a layer a specified percentage of units in that layer are switched off. \n",
" \n",
"Both L2 regularization and Dropout make the model simpler and thus reducing overfitting.\n",
"</p>\n",
"</div>\n",
"\n",
"### Exercise section\n",
"* Add dropout instead of L2 regularization in the network above"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Adding dropout is easy in keras\n",
"# We import a layer called Dropout and add as follows\n",
"# model.add(Dropout(0.2)) to randomly drop 20% of the hidden units\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"# Solution\n",
"# Adding Dropout\n",
"# Building the tensorflow model\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense, Dropout\n",
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
"\n",
"def mnist_model():\n",
" \n",
" model = Sequential()\n",
"\n",
" model.add(Dense(64, input_shape=(28*28,), activation=\"relu\"))\n",
" \n",
" model.add(Dropout(0.15))\n",
"\n",
" model.add(Dense(64, activation=\"relu\"))\n",
" \n",
" model.add(Dense(10, activation=\"softmax\"))\n",
"\n",
" model.compile(loss=\"categorical_crossentropy\",\n",
" optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
" \n",
" return model\n",
"\n",
"model = mnist_model()\n",
"\n",
"num_epochs = 20\n",
"model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs,\n",
" batch_size=512, validation_data=(X_test_prep, y_test_onehot))\n",
"\n",
"# Evaluating the model on test dataset\n",
"history_model = model_run.history\n",
"print(\"The history has the following data: \", history_model.keys())\n",
"\n",
"# Plotting the training and validation accuracy during the training\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"accuracy\"], color = \"blue\", label=\"Training set\") ;\n",
"sns.lineplot(np.arange(1, num_epochs+1), history_model[\"val_accuracy\"], color = \"red\", label=\"Valdation set\") ;\n",
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
"plt.xlabel(\"epochs\") ;\n",
"plt.ylabel(\"accuracy\") ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Network Architectures\n",
"\n",
"The neural networks which we have seen till now are the simplest kind of neural networks.\n",
"There exist more sophisticated network architectures especially designed for specific applications.\n",
"Some of them are as follows:\n",
"\n",
"### Convolution Neural Networks (CNNs)\n",
"\n",
"These networks are used mostly for computer vision like tasks such as image classification and object detection. \n",
"One of the old CNN networks is shown below.\n",
"\n",
"<center>\n",
"<figure>\n",
"<img src=\"./images/neuralnets/CNN_lecun.png\" width=\"800\"/>\n",
"<figcaption>source: LeCun et al., Gradient-based learning applied to document recognition (1998).</figcaption>\n",
"</figure>\n",
"</center>\n",
"\n",
"CNNs consist of new type of layers such as convolution and pooling layers.\n",
"\n",
"### Recurrent Neural Networks (RNNs)\n",
"\n",
"RNNs are used for problems such as time-series data, speech recognition and translation.\n",
"\n",
"### Generative adversarial networks (GANs)\n",
"\n",
"GANs consist of 2 parts, a generative network and a discriminative network. The generative network produces data which is then fed to the discriminative network which judges if the new data belongs to a specified dataset. Then via feedback loops the generative network becomes better and better at creating images similar to the dataset the discriminative network is judging against. At the same time the discriminative network get better and better at identifyig **fake** instances which are not from the reference dataset. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## CNN in a bit more detail\n",
"\n",
"The standard CNN architecture can be seen as 2 parts:\n",
"\n",
"* Feature extraction\n",
"* Classification\n",
"\n",
"For the **classification** part we use the densly connected network as shown in the keras examples above.\n",
"\n",
"However, for the **feature extraction** part we use new types of layers called **convolution** layers\n",
"\n",
"### What is a Convolution?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"sns.set_style(\"white\")\n",
"# Loading the train and test data\n",
"digit = np.genfromtxt(\"data/digit_4_14x14.csv\", delimiter=\",\").astype(np.int16) ;\n",
"plt.imshow(digit, \"gray_r\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This image in matrix form"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"def plot_astable(matrix, hw=0.15):\n",
" matrix = plt.table(cellText=matrix, loc=(0,0), cellLoc='center') ;\n",
" matrix.set_fontsize(14)\n",
" cells=matrix.get_celld() ;\n",
" for i in cells:\n",
" cells[i].set_height(hw) ;\n",
" cells[i].set_width(hw) ;\n",
" plt.axis(\"off\")"
]
},
{
"cell_type": "code",
"source": [
"plot_astable(digit)"
]
},
{
"cell_type": "code",
"source": [
"# Vertical edge detection\n",
"vertical_edge_kernel = np.array([[-1, 2, -1], [-1, 2, -1], [-1, 2, -1]])\n",
"plot_astable(vertical_edge_kernel, 0.2)"
]
},
{
"cell_type": "code",
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
"source": [
"import numpy as np\n",
"\n",
"def convolution(matrix, kernel):\n",
" # This function computes a convolution between a matrix and a kernel/filter without any padding\n",
" width_kernel = kernel.shape[0]\n",
" height_kernel = kernel.shape[1]\n",
" convolution = np.zeros((matrix.shape[0] - width_kernel + 1,\n",
" matrix.shape[1] - height_kernel + 1))\n",
" for i in range(matrix.shape[0] - width_kernel + 1):\n",
" for j in range(matrix.shape[1] - height_kernel + 1):\n",
" convolution[i, j] = np.sum(np.multiply(\n",
" matrix[i:i+width_kernel, j:j+height_kernel], kernel))\n",
" return convolution\n",
"\n",
"\n",
"vertical_detect = convolution(digit, vertical_edge_kernel)\n",
"plt.imshow(vertical_detect, cmap=\"gray_r\") ;"
]
},
{
"cell_type": "code",
"source": [
"# Horizontal edge detection\n",
"horizontal_edge_kernel = np.array([[-1, -1, -1], [2, 2, 2], [-1, -1, -1]])\n",
"plot_astable(horizontal_edge_kernel, 0.2)"
]
},
{
"cell_type": "code",
"source": [
"horizontal_detect = convolution(digit, horizontal_edge_kernel)\n",
"plt.imshow(horizontal_detect, cmap=\"gray_r\") ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Maxpooling\n",
"Taking maximum in n x n sized sliding windows"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"def maxpool_2x2(matrix):\n",
" out_dim = np.array([matrix.shape[0]/2, matrix.shape[1]/2]).astype(int)\n",
" subsample = np.zeros((out_dim))\n",
" for i in range(out_dim[0]):\n",
" for j in range(out_dim[1]):\n",
" subsample[i,j] = np.max(matrix[i*2:i*2+2, j*2:j*2+2])\n",
" return subsample"
]
},
{
"cell_type": "code",
"source": [
"import matplotlib.pyplot as plt\n",
"subsampled_image = maxpool_2x2(vertical_detect)\n",
"plt.imshow(subsampled_image, cmap=\"gray_r\")\n",
"plt.title(\"Max Pooled vertical edge detection filter\") ;"
]
},
{
"cell_type": "code",
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
"source": [
"subsampled_image = maxpool_2x2(horizontal_detect)\n",
"plt.imshow(subsampled_image, cmap=\"gray_r\") ;\n",
"plt.title(\"Max Pooled horizontal edge detection filter\") ;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's explore some more of such filters/kernels!!\n",
"\n",
"http://setosa.io/ev/image-kernels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## CNN Examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this example we will work with a dataset called fashion-MNIST which is quite similar to the MNIST data above.\n",
"> Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.\n",
"source: https://github.com/zalandoresearch/fashion-mnist\n",
"\n",
"The 10 classes of this dataset are:\n",
"\n",
"| Label| Item |\n",
"| --- | --- |\n",
"| 0 |\tT-shirt/top |\n",
"| 1\t| Trouser |\n",
"|2|\tPullover|\n",
"|3|\tDress|\n",
"|4|\tCoat|\n",
"|5|\tSandal|\n",
"|6|\tShirt|\n",
"|7|\tSneaker|\n",
"|8|\tBag|\n",
"|9|\tAnkle boot|"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Loading the dataset in tensorflow\n",
"# Later you can explore and play with other datasets with come with tensorflow\n",
"from tensorflow.keras.datasets import fashion_mnist\n",
"\n",
"# Loading the train and test data\n",
"\n",
"(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()\n",
"\n",
"items =['T-shirt/top', 'Trouser', \n",
" 'Pullover', 'Dress', \n",
" 'Coat', 'Sandal', \n",
" 'Shirt', 'Sneaker',\n",
" 'Bag', 'Ankle boot']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"i=np.random.randint(0,X_train.shape[0])\n",
"plt.imshow(X_train[i], cmap=\"gray_r\") ; \n",
"print(\"This item is a: \" , items[y_train[i]])"
]
},
{
"cell_type": "code",
"source": [
"# Also we need to reshape the input data such that each sample is a 4D matrix of dimension\n",
"# (num_samples, width, height, channels). Even though these images are grayscale we need to add\n",
"# channel dimension as this is expected by the Conv function\n",
"X_train_prep = X_train.reshape(X_train.shape[0],28,28,1)/255.\n",
"X_test_prep = X_test.reshape(X_test.shape[0],28,28,1)/255.\n",
"\n",
"from tensorflow.keras.utils import to_categorical\n",
"\n",
"y_train_onehot = to_categorical(y_train, num_classes=10)\n",
"y_test_onehot = to_categorical(y_test, num_classes=10)\n",
"\n",
"print(y_train_onehot.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Creating a CNN similar to the one shown in the figure from LeCun paper\n",
"# In the original implementation Average pooling was used. However, we will use maxpooling as this \n",
"# is what us used in the more recent architectures and is found to be a better choice\n",
"# Convolution -> Pooling -> Convolution -> Pooling -> Flatten -> Dense -> Dense -> Output layer\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout, BatchNormalization\n",
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
"\n",
"def simple_CNN():\n",
" \n",
" model = Sequential()\n",
" \n",
" model.add(Conv2D(6, (3,3), input_shape=(28,28,1), activation='relu'))\n",
" \n",
" model.add(MaxPool2D((2,2)))\n",
" \n",
" model.add(Conv2D(16, (3,3), activation='relu'))\n",
" \n",
" model.add(MaxPool2D((2,2)))\n",
" \n",
" model.add(Flatten())\n",
" \n",
" model.add(Dense(120, activation='relu'))\n",
" \n",
" model.add(Dense(84, activation='relu'))\n",
" \n",
" model.add(Dense(10, activation='softmax'))\n",
" \n",
" model.compile(loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
" \n",
" return model\n",
"\n",
"model = simple_CNN()\n",
"model.summary()"
]
},
{
"cell_type": "code",
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
"source": [
"num_epochs = 5\n",
"model_run = model.fit(X_train_prep, y_train_onehot, epochs=num_epochs, \n",
" batch_size=64, validation_data=(X_test_prep, y_test_onehot))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise section\n",
"* Use the above model or improve it (change number of filters, add more layers etc. on the MNIST example and see if you can get a better accuracy than what we achieved with a vanilla neural network)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise section\n",
"* Explore the CIFAR10 (https://www.cs.toronto.edu/~kriz/cifar.html) dataset included with Keras and build+train a simple CNN to classify it"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"(X_train, y_train), (X_test, y_test) = cifar10.load_data()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (C) 2019-2021 ETH Zurich, SIS ID"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": false,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,