Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Neural Networks\n",
"\n",
"## TO DO: Almost all the figues and schematics will be replaced or improved slowly\n",
"\n",
"<img src=\"./images/neuralnets/Colored_neural_network.svg\"/>\n",
"source: https://en.wikipedia.org/wiki/Artificial_neural_network\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## History of Neural networks\n",
"\n",
"**TODO: Make it more complete and format properly**\n",
"\n",
"1943 - Threshold Logic\n",
"\n",
"1940s - Hebbian Learning\n",
"\n",
"1958 - Perceptron\n",
"\n",
"1975 - Backpropagation\n",
"\n",
"1980s - Neocognitron\n",
"\n",
"1982: Hopfield Network\n",
"\n",
"1986: Convolutional Neural Networks\n",
"\n",
"1997: Long-short term memory (LSTM) model\n",
"\n",
"2014: Gated Recurrent Units, Generative Adversarial Networks(Check)?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Why the boom now?\n",
"* Data\n",
"* Data\n",
"* Data\n",
"* Availability of GPUs\n",
"* Algorithmic developments which allow for efficient training and training for deeper networks\n",
"* Much easier access than a decade ago"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building blocks\n",
"### Perceptron\n",
"\n",
"Smallest unit of a neural network is a **perceptron** like node.\n",
"\n",
"**What is a Perceptron?**\n",
"\n",
"It is a simple function which has multiple inputs and a single output.\n",
"\n",
"Step 1: Weighted sum of the inputs is calculated\n",
"\n",
"\\begin{equation*}\n",
"weighted\\_sum = \\sum_{k=1}^{num\\_inputs} w_{i} x_{i}\n",
"\\end{equation*}\n",
"\n",
"Step 2: The following activation function is applied\n",
"\n",
"$$\n",
"f(weighted\\_sum) = \\left\\{\n",
" \\begin{array}{ll}\n",
" 0 & \\quad weighted\\_sum < threshold \\\\\n",
" 1 & \\quad weighted\\_sum \\geq threshold\n",
"\n",
"You can see that this is also a linear classifier as we introduced in script 02."
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"import matplotlib as mpl\n",
"mpl.rcParams['lines.linewidth'] = 3"
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"def perceptron(X, w, threshold=1):\n",
" # This function computes sum(w_i*x_i) and \n",
" # applies a perceptron activation\n",
" linear_sum = np.dot(X,w)\n",
" output=0\n",
" if linear_sum >= threshold:\n",
" output = 1\n",
" # print(\"The perceptron has peaked\")\n",
" return output\n",
"X = [1,0]\n",
"w = [1,1]\n",
"perceptron(X,w)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Boolean AND\n",
"\n",
"| x$_1$ | x$_2$ | output |\n",
"| --- | --- | --- |\n",
"| 0 | 0 | 0 |\n",
"| 1 | 0 | 0 |\n",
"| 0 | 1 | 0 |\n",
"| 1 | 1 | 1 |"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Perceptron output for x1, x2 = [0, 0] is 0\n",
"Perceptron output for x1, x2 = [1, 0] is 0\n",
"Perceptron output for x1, x2 = [0, 1] is 0\n",
"Perceptron output for x1, x2 = [1, 1] is 1\n"
]
}
],
"source": [
"# Calculating Boolean AND using a perceptron\n",
"import matplotlib.pyplot as plt\n",
"w=[1,1]\n",
"X=[[0,0],[1,0],[0,1],[1,1]]\n",
"for i in X:\n",
" print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))"
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this simple case we can rewrite our equation to $x_2 = ...... $ which describes a line in 2D:"
]
},
{
"cell_type": "code",
"image/png": "\n",
"image/png": {
"height": 252,
"width": 388
},
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Plotting the decision boundary\n",
"plt.xlim(-1,2)\n",
"plt.ylim(-1,2)\n",
"for i in X:\n",
" plt.plot(i,\"o\",color=\"b\");\n",
"# Plotting the decision boundary\n",
"# that is a line given by w_1*x_1+w_2*x_2-threshold=0\n",
"x1 = np.arange(-3,4)\n",
"x2 = threshold - np.arange(-3,4)\n",
"plt.plot(x1, x2 , \"--\" ,color=\"black\");"
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Exercise :Can you compute a Boolean \"OR\" using a perceptron?**\n",
"\n",
"Hint: copy the code from the \"AND\" example and edit the weights and/or threshold"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Boolean OR\n",
"\n",
"| x$_1$ | x$_2$ | output |\n",
"| --- | --- | --- |\n",
"| 0 | 0 | 0 |\n",
"| 1 | 0 | 1 |\n",
"| 0 | 1 | 1 |\n",
"| 1 | 1 | 1 |"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Calculating Boolean OR using a perceptron\n",
"# Edit the code below"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Perceptron output for x1, x2 = [0, 0] is 0\n",
"Perceptron output for x1, x2 = [1, 0] is 1\n",
"Perceptron output for x1, x2 = [0, 1] is 1\n",
"Perceptron output for x1, x2 = [1, 1] is 1\n"
"image/png": "\n",
"image/png": {
"height": 252,
"width": 388
},
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Solution\n",
"# Calculating Boolean OR using a perceptron\n",
"import matplotlib.pyplot as plt\n",
"threshold=0.6\n",
"w=[1,1]\n",
"X=[[0,0],[1,0],[0,1],[1,1]]\n",
"for i in X:\n",
" print(\"Perceptron output for x1, x2 = \" , i , \" is \" , perceptron(i,w,threshold))\n",
"# Plotting the decision boundary\n",
"plt.xlim(-1,2)\n",
"plt.ylim(-1,2)\n",
"for i in X:\n",
" plt.plot(i,\"o\",color=\"b\");\n",
"# Plotting the decision boundary\n",
"# that is a line given by w_1*x_1+w_2*x_2-threshold=0\n",
"x1 = np.arange(-3,4)\n",
"x2 = threshold - np.arange(-3,4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Optional exercise: Create a NAND gate using a perceptron**\n",
"\n",
"#### Boolean NAND\n",
"\n",
"| x$_1$ | x$_2$ | output |\n",
"| --- | --- | --- |\n",
"| 0 | 0 | 1 |\n",
"| 1 | 0 | 1 |\n",
"| 0 | 1 | 1 |\n",
"| 1 | 1 | 0 |"
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
"metadata": {},
"outputs": [],
"source": [
"# Calculating Boolean NAND using a perceptron\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In fact a single perceptron can compute \"AND\", \"OR\" and \"NOT\" boolean functions.\n",
"However, it cannot compute some other boolean functions such as \"XOR\"\n",
"\n",
"WHAT CAN WE DO?\n",
"Hint: What is the significance of the NAND gate we created above\n",
"\n",
"We said a single perceptron can't compute these functions. We didn't say that about **multiple Perceptrons**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**XOR function**\n",
"\n",
"**TO DO: INSERT IMAGE HERE!!!!!!!!!!!!!!**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
"### Google Playground\n",
"\n",
"UWE: move up before discussing gradient stuff etc\n",
"\n",
"https://playground.tensorflow.org/\n",
"\n",
"<img src=\"./images/neuralnets/google_playground.png\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning\n",
"\n",
"Now we know that we can compute complex functions if we stack together a number of perceptrons.\n",
"\n",
"However, we can DO NOT want to set the weights and thresholds by hand as we did in the examples above.\n",
"\n",
"We want some algorithm to do this for us!\n",
"\n",
"In order to achieve this we first need to choose a loss function for the problem at hand\n",
"\n",
"\n",
"### Loss function\n",
"As in the case of other machine learning algorithms we need to define a so-called \"Loss function\". In simple words this function measures how close are the predictions of our network to the supplied labels. Once we have this function we need an algorithm to update the weights of the network such that this loss decreases. As one can already imagine the choice of an appropriate loss function is very important to the success of the trained model. Fortunately, for classification and regression (which comprise of a large range of probelms) these loss functions are well known. Generally **crossentropy** and **mean squared error** loss functions are chosen for classification and regression problems, respectively.\n",
"\n",
"### Gradient based learning\n",
"Once we have a loss function we want to solve an **optimization problem** which minimizes this loss by updating the weights of the network and this is how the learning actually happens.\n",
"\n",
"One of the most popular optimization method used in machine learning is **Gradient-descent**\n",
"\n",
"INSERT MORE EXPLAINATIONS HERE\n",
"\n",
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
"In order to train the network we need to change Perceptron's **step** activation function as it does not allow training using the back-propagation algorithm among other drawbacks.\n",
"\n",
"Non-Linear functions such as:\n",
"\n",
"* ReLU (Rectified linear unit)\n",
"\n",
"\\begin{equation*}\n",
"f(z) = \\mathrm{max}(0,z)\n",
"\\end{equation*}\n",
"\n",
"* Sigmoid\n",
"\n",
"\\begin{equation*}\n",
"f(z) = \\frac{1}{1+e^{-z}}\n",
"\\end{equation*}\n",
"\n",
"* tanh\n",
"\n",
"\\begin{equation*}\n",
"f(z) = \\frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\n",
"\\end{equation*}\n",
"\n",
"\n",
"are some of the most popular choices used as activation functions.\n",
"\n",
"Linear activations are **NOT** used because it can be mathematically shown that if linear activations are used then output is just a linear function of the input. So adding any number of hidden layers does not help to learn interesting functions.\n",
"\n",
"Non-linear activation functions allow the network to learn more complex representations."
"image/png": "\n",
"image/png": {
"height": 250,
"width": 597
},
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"plt.plot(pts, 1/(1+np.exp(-pts))) ;\n",
"\n",
"plt.subplot(1, 3, 2)\n",
"plt.plot(pts, np.tanh(pts*np.pi)) ;\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Suggestion Uwe:\n",
"\n",
"1. more layers might improve power of single perctptron.\n",
"\n",
"2. regrettably math show that just \"stacking\" perceptrons only adds little improvements\n",
"\n",
"3. way around: look at nature how neuron works and introduce non linear activation functions.\n",
"\n",
"4. theoretical background: universal approximation theorem.\n",
"\n",
"\n",
"\n",
"### Multi-layer preceptron neural network\n",
"Universal function theorem\n",
"\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Keras"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is **Keras**?\n",
"\n",
"* It is a high level API to create and work with neural networks\n",
"* Supports multiple backends such as TensorFlow from Google, Theano (Although Theano is dead now) and CNTK (Microsoft Cognitive Toolkit)\n",
"* Very good for creating neural nets very quickly and hides away a lot of tedious work\n",
"* Has been incorporated into official TensorFlow (which obviously only works with tensforflow) and as of TensorFlow 2.0 this will the main api to use TensorFlow (check reference)\n"
"name": "stdout",
"output_type": "stream",
"text": [
"_________________________________________________________________\n",
"Layer (type) Output Shape Param # \n",
"=================================================================\n",
"dense_9 (Dense) (None, 4) 36 \n",
"_________________________________________________________________\n",
"activation_7 (Activation) (None, 4) 0 \n",
"_________________________________________________________________\n",
"dense_10 (Dense) (None, 4) 20 \n",
"_________________________________________________________________\n",
"dense_11 (Dense) (None, 1) 5 \n",
"_________________________________________________________________\n",
"activation_8 (Activation) (None, 1) 0 \n",
"=================================================================\n",
"Total params: 61\n",
"Trainable params: 61\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
],
"source": [
"# Say hello to keras\n",
"\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Activation\n",
"\n",
"# Creating a model\n",
"model = Sequential()\n",
"\n",
"# Adding layers to this model\n",
"# 1st Hidden layer\n",
"# A Dense/fully-connected layer which takes as input a \n",
"# feature array of shape (samples, num_features)\n",
"# Here input_shape = (8,) means that the layer expects an input with num_features = 8 \n",
"# and the sample size could be anything\n",
"# Then we specify an activation function\n",
"model.add(Dense(units=4, input_shape=(8,)))\n",
"# 2nd Hidden layer\n",
"# This is also a fully-connected layer and we do not need to specify the\n",
"# shape of the input anymore (We need to do that only for the first layer)\n",
"# NOTE: Now we didn't add the activation seperately. Instead we just added it\n",
"# while calling Dense(). This and the way used for the first layer are Equivalent!\n",
"model.add(Dense(units=4, activation=\"relu\"))\n",
"\n",
" \n",
"# The output layer\n",
"model.add(Dense(units=1))\n",
"model.add(Activation(\"sigmoid\"))\n",
"\n",
"model.summary()"
]
},
{
"cell_type": "code",
"# Fitting the model "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TO DO: Move the MNIST example after the previous dataset examples**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MNIST datasets is a very common dataset used in machine learning. It is widely used to train and validate models.\n",
"\n",
">The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a >test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size->normalized and centered in a fixed-size image.\n",
">It is a good database for people who want to try learning techniques and pattern recognition methods on real-world >data while spending minimal efforts on preprocessing and formatting.\n",
">source: http://yann.lecun.com/exdb/mnist/\n",
"\n",
"The problem we want to solve using this dataset is: multi-class classification\n",
"This dataset consists of images of handwritten digits between 0-9 and their corresponsing labels. We want to train a neural network which is able to predict the correct digit on the image. "
]
},
{
"cell_type": "code",
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
"metadata": {},
"outputs": [],
"source": [
"# Loading the dataset in keras\n",
"# Later you can explore and play with other datasets with come with Keras\n",
"from keras.datasets import mnist\n",
"\n",
"# Loading the train and test data\n",
"\n",
"(X_train, y_train), (X_test, y_test) = mnist.load_data()"
]
},
{
"cell_type": "code",
"execution_count": 185,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(60000, 28, 28)\n"
]
}
],
"source": [
"# Looking at the dataset\n",
"print(X_train.shape)"
]
},
{
"cell_type": "code",
"execution_count": 186,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This digit is: 8\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7fe8e68579e8>"
]
},
"metadata": {
"image/png": {
"height": 250,
"width": 253
},
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# We can see that the training set consists of 60,000 images of size 28x28 pixels\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"i=np.random.randint(0,X_train.shape[0])\n",
"plt.imshow(X_train[i], cmap=\"gray_r\") ;\n",
"print(\"This digit is: \" , y_train[i])"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
]
}
],
"source": [
"# Look at the data values for a couple of images\n",
{
"cell_type": "markdown",
"metadata": {},
"source": [
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
"The data consists of values between 0-255 representing the **grayscale level**"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(60000,)\n"
]
}
],
"source": [
"# The labels are the digit on the image\n",
"print(y_train.shape)"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {},
"outputs": [],
"source": [
"# Scaling the data\n",
"# It is important to normalize the input data to (0-1) before providing it to a neural net\n",
"# We could use the previously introduced function from SciKit learn. However, here it is sufficient to\n",
"# just divide the input data by 255\n",
"X_train_norm = X_train/255.\n",
"X_test_norm = X_test/255.\n",
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
"# Also we need to reshape the input data such that each sample is a vector and not a 2D matrix\n",
"X_train_prep = X_train_norm.reshape(X_train_norm.shape[0],28*28)\n",
"X_test_prep = X_test_norm.reshape(X_test_norm.shape[0],28*28)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**IMPORTANT: One-Hot encoding**\n",
"\n",
"**TODO: Better frame the explaination**\n",
"In such problems the labels are provided as something called **One-hot encodings**. What this does is to convert a categorical label to a vector.\n",
"\n",
"For the MNIST problem where we have **10 categories** one-hot encoding will create a vector of length 10 for each of the labels. All the entries of this vector will be zero **except** for the index which is equal to the integer value of the label.\n",
"\n",
"For example:\n",
"if label is 4. The one-hot vector will look like **[0 0 0 0 1 0 0 0 0 0]**\n",
"\n",
"Fortunately, we don't have to code this ourselves because Keras has a built-in function for this."
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(60000, 10)\n"
]
}
],
"source": [
"from keras.utils.np_utils import to_categorical\n",
"\n",
"y_train_onehot = to_categorical(y_train, num_classes=10)\n",
"y_test_onehot = to_categorical(y_test, num_classes=10)\n",
"\n",
"print(y_train_onehot.shape)"
]
},
{
"cell_type": "code",
"execution_count": 194,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/20\n",
"60000/60000 [==============================] - 2s 34us/step - loss: 0.5888 - acc: 0.8434\n",
"Epoch 2/20\n",
"60000/60000 [==============================] - 1s 20us/step - loss: 0.2569 - acc: 0.9267\n",
"Epoch 3/20\n",
"60000/60000 [==============================] - 1s 16us/step - loss: 0.2024 - acc: 0.9416\n",
"Epoch 4/20\n",
"60000/60000 [==============================] - 1s 17us/step - loss: 0.1706 - acc: 0.9497\n",
"Epoch 5/20\n",
"60000/60000 [==============================] - 1s 23us/step - loss: 0.1475 - acc: 0.9563\n",
"Epoch 6/20\n",
"60000/60000 [==============================] - 1s 20us/step - loss: 0.1290 - acc: 0.9627\n",
"Epoch 7/20\n",
"60000/60000 [==============================] - 1s 23us/step - loss: 0.1162 - acc: 0.9651\n",
"Epoch 8/20\n",
"60000/60000 [==============================] - 1s 19us/step - loss: 0.1035 - acc: 0.9691\n",
"Epoch 9/20\n",
"60000/60000 [==============================] - 2s 28us/step - loss: 0.0939 - acc: 0.9716\n",
"Epoch 10/20\n",
"60000/60000 [==============================] - 1s 22us/step - loss: 0.0848 - acc: 0.9743\n",
"Epoch 11/20\n",
"60000/60000 [==============================] - 1s 25us/step - loss: 0.0777 - acc: 0.9763\n",
"Epoch 12/20\n",
"60000/60000 [==============================] - 1s 20us/step - loss: 0.0720 - acc: 0.9780\n",
"Epoch 13/20\n",
"60000/60000 [==============================] - 1s 22us/step - loss: 0.0655 - acc: 0.9808\n",
"Epoch 14/20\n",
"60000/60000 [==============================] - 2s 30us/step - loss: 0.0610 - acc: 0.9817\n",
"Epoch 15/20\n",
"60000/60000 [==============================] - 1s 16us/step - loss: 0.0563 - acc: 0.9832\n",
"Epoch 16/20\n",
"60000/60000 [==============================] - 1s 20us/step - loss: 0.0527 - acc: 0.9842\n",
"Epoch 17/20\n",
"60000/60000 [==============================] - 1s 21us/step - loss: 0.0478 - acc: 0.9854\n",
"Epoch 18/20\n",
"60000/60000 [==============================] - 1s 15us/step - loss: 0.0453 - acc: 0.9864\n",
"Epoch 19/20\n",
"60000/60000 [==============================] - 1s 18us/step - loss: 0.0419 - acc: 0.9874\n",
"Epoch 20/20\n",
"60000/60000 [==============================] - 1s 20us/step - loss: 0.0387 - acc: 0.9885\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7fe8e7465438>"
]
},
"execution_count": 194,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Building the keras model\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense\n",
"\n",
"model = Sequential()\n",
"\n",
"model.add(Dense(64,input_shape=(28*28,), activation=\"relu\"))\n",
"\n",
"model.add(Dense(64, activation = \"relu\"))\n",
"\n",
"model.add(Dense(10, activation = \"softmax\"))\n",
"\n",
"model.compile(loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
"\n",
"model_history = model.fit(X_train_prep, y_train_cat, epochs=20, batch_size=512);"
]
},
{
"cell_type": "code",
"execution_count": 196,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10000/10000 [==============================] - 1s 85us/step\n",
"The [loss, accuracy] are: [0.08737125840586377, 0.974]\n"
]
}
],
"source": [
"# Evaluating the model on test dataset\n",
"print(\"The [loss, accuracy] on test dataset are: \" , model.evaluate(X_test_prep, y_test_onehot))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Work in Progress\n",
"\n",
"## Network results on dataset used in previous notebooks"
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.model_selection import train_test_split\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Creating a network to solve the XOR problem\n",
"# Loading and plotting the data\n",
"xor = pd.read_csv(\"xor.csv\")\n",
"xv = xor[\"x\"]\n",
"yv = xor[\"y\"]\n",
"\n",
"colors = [[\"steelblue\", \"chocolate\"][i] for i in xor[\"label\"]]\n",
"plt.figure(figsize=(5, 5))\n",
"plt.xlim([-2, 2])\n",
"plt.ylim([-2, 2])\n",
"plt.title(\"Blue points are False\")\n",
"\n",
"\n",
"plt.scatter(xv, yv, color=colors, marker=\"o\");"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# Using x and y coordinates as featues\n",
"features = xor.iloc[:, :-1]\n",
"# Convert boolean to integer values (True->1 and False->0)\n",
"labels = xor.iloc[:, -1].astype(int)\n",
"\n",
"# Building a Keras model\n",
"\n",
"def a_simple_NN():\n",
" \n",
" model = Sequential()\n",
" model.add(Dense(4, input_shape = (2,), activation = \"relu\"))\n",
" model.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"])\n",
" \n",
" return model\n",
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 350 samples, validate on 150 samples\n",
"Epoch 1/100\n",
"350/350 [==============================] - 1s 2ms/step - loss: 0.8305 - acc: 0.3571 - val_loss: 0.8120 - val_acc: 0.3667\n",
"350/350 [==============================] - 0s 88us/step - loss: 0.8170 - acc: 0.3629 - val_loss: 0.8010 - val_acc: 0.3667\n",
"350/350 [==============================] - 0s 121us/step - loss: 0.8060 - acc: 0.3657 - val_loss: 0.7904 - val_acc: 0.3733\n",
"350/350 [==============================] - 0s 133us/step - loss: 0.7960 - acc: 0.3743 - val_loss: 0.7807 - val_acc: 0.3867\n",
"350/350 [==============================] - 0s 121us/step - loss: 0.7866 - acc: 0.3800 - val_loss: 0.7716 - val_acc: 0.3867\n",
"350/350 [==============================] - 0s 91us/step - loss: 0.7773 - acc: 0.3886 - val_loss: 0.7625 - val_acc: 0.3867\n",
"350/350 [==============================] - 0s 97us/step - loss: 0.7682 - acc: 0.3914 - val_loss: 0.7536 - val_acc: 0.3867\n",
"350/350 [==============================] - 0s 86us/step - loss: 0.7594 - acc: 0.4086 - val_loss: 0.7450 - val_acc: 0.4067\n",
"350/350 [==============================] - 0s 81us/step - loss: 0.7507 - acc: 0.4143 - val_loss: 0.7367 - val_acc: 0.4200\n",
"350/350 [==============================] - 0s 88us/step - loss: 0.7420 - acc: 0.4200 - val_loss: 0.7283 - val_acc: 0.4333\n",
"350/350 [==============================] - 0s 130us/step - loss: 0.7335 - acc: 0.4343 - val_loss: 0.7200 - val_acc: 0.4533\n",
"350/350 [==============================] - 0s 87us/step - loss: 0.7252 - acc: 0.4429 - val_loss: 0.7123 - val_acc: 0.4600\n",
"350/350 [==============================] - 0s 138us/step - loss: 0.7172 - acc: 0.4514 - val_loss: 0.7043 - val_acc: 0.4733\n",
"350/350 [==============================] - 0s 103us/step - loss: 0.7091 - acc: 0.4600 - val_loss: 0.6967 - val_acc: 0.4733\n",
"350/350 [==============================] - 0s 144us/step - loss: 0.7014 - acc: 0.4800 - val_loss: 0.6894 - val_acc: 0.4933\n",