Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to machine-learning with Python\n",
"\n",
"\n",
"\n",
"### Targeted audience\n",
"\n",
"- Researchers having no machine learning experience yet.\n",
"- Basic Python knowledge.\n",
"- Almost no math knowledge required.\n",
"\n",
"### Course structure\n",
"\n",
"- Two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.\n",
"- Every part below includes a coding session using Jupyter notebooks.\n",
"- Coding sessions provide code frames which should be completed.\n",
"- We provide solutions.\n",
"\n",
"\n",
"## Day 1\n",
"\n",
"### Part 0: Preparation\n",
"\n",
"- Quick basics matplotlib, numpy, pandas?\n",
"\n",
"\n",
"#### Coding session\n",
"\n",
"- read dataframe from csv or excel sheet with beer features\n",
"- do some features vs features scatter plots\n",
"\n",
"\n",
"### Part 1: Introduction\n",
"\n",
"- What is machine learning ?\n",
"- What are features / samples / feature matrix ?\n",
"- Learning problems: supervised / unsupervised\n",
"\n",
"\n",
"#### Code walkthrough:\n",
"\n",
" - Classification: linear SVM classifier or logistic regression example\n",
" - Clustering: scikit-learn example to find clusters.\n",
"\n",
"\n",
"### Part 2: classification\n",
"\n",
" Intention: demonstrate one / two simple examples of classifiers, also\n",
" introduce the concept of decision boundary\n",
"\n",
" - Introduction: some simple two dimensional examples incl. decision function.\n",
"\n",
" - Idea of linear classifier:\n",
" - simple linear classifier (linear SVM e.g.)\n",
" - beer example with some weights\n",
"\n",
" - Discuss code example with logistic regression for beer data, show weights\n",
"\n",
"#### Coding session:\n",
"\n",
" - Change given code to use a linear SVM classifier\n",
" - Use different data set which can not be classified well with a linear classifier\n",
"\n",
"\n",
"### Part 3: accuracy, F1, ROC, ...\n",
"\n",
"Intention: accuracy is useful but has pitfalls\n",
"\n",
"- how to measure accuracy ?\n",
"\n",
" - confusion matrix\n",
" - accurarcy\n",
" - pitfalls for unbalanced data sets\n",
" e.g. diagnose HIV\n",
" - precision / recall\n",
"\n",
"#### Coding session\n",
"\n",
"- Evaluate accuracy of linear beer classifier from latest section\n",
"- Determine precision / recall\n",
"\n",
"\n",
"### Part 4: underfitting/overfitting\n",
"\n",
"classifiers / regressors have parameters / degrees of freedom.\n",
"\n",
"- underfitting: linear classifier on nonlinear problem\n",
"\n",
"- overfitting:\n",
"\n",
" - features have actual noise, or not enough information: orchid example in 2d. elevate to 3d using another feature.\n",
" - polynome of degree 5 to fit points on a line + noise\n",
" - points in a circle: draw very exact boundary line\n",
"\n",
"- how to check underfitting / overfitting ?\n",
"\n",
" - measure accuracy or other metric on test dataset\n",
" - cross validation\n",
"\n",
"\n",
"#### Coding session:\n",
"\n",
"- How to do cross validation with scikit-learn\n",
"- run cross validation on classifier for beer data\n",
"\n",
"\n",
"### Part 5: pipelines / parameter tuning with scikit-learn\n",
"\n",
"- Scikit learn API incl. summary of what we have seen up to now.\n",
"- pipelines, preprocessing (scaler, PCA)\n",
"- cross validation\n",
"- Hyper parameter tuning: grid search / random search.\n",
"\n",
"#### Coding session\n",
"\n",
"- examples\n",
"\n",
"\n",
"## DAY 2\n",
"\n",
"### Part 6: Overview classifiers\n",
"\n",
"- Nearest neighbours\n",
"- SVMs\n",
" - demo for RBF: different parameters influence on decision line\n",
"- Random forests\n",
"- Gradient Tree Boosting\n",
"\n",
"\n",
"#### Coding session\n",
"\n",
"- Prepare examples for 2d classification problems incl. visualization of different\n",
" decision surfaces.\n",
"\n",
"- Play with different classifiers on beer data\n",
"\n",
"### Part 7: Regression\n",
"\n",
"- What are differences compared to classification: output, how to measure accuracy, ...\n",
"\n",
"- Example: fit polynomial, examples for underfitting and overfitting\n",
"\n",
"\n",
"#### Coding session\n",
"\n",
"Introduce movie data set, learn SVR or other regressor on this data set.\n",
"\n",
"\n",
"### Part 8: Introduction neural networks\n",
"\n",
"\n",
"- Overview of the field\n",
"- Introduction to feed forward neural networks\n",
"- Demo Keras\n",
"\n",
"#### Coding Session\n",
"\n",
"- keras reuse network and play with it.\n",
"\n",
"\n",
"## Workshop\n",
"\n",
"- assist to setup the workshop material on own computer.\n",
"- provide example problems if attendees don't bring own data.\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}