Newer
Older
# Targeted audience
- Researchers having no machine learning experience yet.
- Basic Python knowledge.
- Almost no math knowledge required.
# Course structure
- Two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.
- Every part below includes a coding session using Jupyter notebooks.
- Coding sessions provide code frames which should be completed.
- We provide solutions.
# Day 1
## Part 0: Preparation
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
### Coding session
- read dataframe from csv or excel sheet with beer features
- do some features vs features scatter plots
## Part 1: Introduction
- What is machine learning ?
- What are features / samples / feature matrix ?
- Learning problems: supervised / unsupervised
### Code walkthrough:
- Classification: linear SVM classifier or logistic regression example
- Clustering: scikit-learn example to find clusters.
## Part 2: classification
Intention: demonstrate one / two simple examples of classifiers, also
introduce the concept of decision boundary
- Introduction: some simple two dimensional examples incl. decision function.
- Idea of linear classifier:
- simple linear classifier (linear SVM e.g.)
- beer example with some weights
- Discuss code example with logistic regression for beer data, show weights
### Coding session:
- Change given code to use a linear SVM classifier
- Use different data set which can not be classified well with a linear classifier
## Part 3: accuracy, F1, ROC, ...
- how to measure accuracy ?
- confusion matrix
- accurarcy
- pitfalls for unbalanced data sets
e.g. diagnose HIV
- precision / recall
### Coding session
- Evaluate accuracy of linear beer classifier from latest section
- Determine precision / recall
## Part 4: underfitting/overfitting
classifiers / regressors have parameters / degrees of freedom.
- underfitting: linear classifier on nonlinear problem
- overfitting:
- features have actual noise, or not enough information: orchid example in 2d. elevate to 3d using another feature.
- polynome of degree 5 to fit points on a line + noise
- points in a circle: draw very exact boundary line
- how to check underfitting / overfitting ?
- measure accuracy or other metric on test dataset
- cross validation
### Coding session:
- How to do cross validation with scikit-learn
## Part 5: pipelines / parameter tuning with scikit-learn
- Scikit learn API incl. summary of what we have seen up to now.
- Hyper parameter tuning: grid search / random search.
### Coding session
- examples
# DAY 2
## Part 6: Overview classifiers
- SVMs
- demo for RBF: different parameters influence on decision line
- Random forests
- Gradient Tree Boosting
### Coding session
- Prepare examples for 2d classification problems incl. visualization of different
decision surfaces.
- Play with different classifiers on beer data
## Part 7: Regression
- What are differences compared to classification: output, how to measure accuracy, ...
- Example: fit polynomial, examples for underfitting and overfitting
### Coding session
Introduce movie data set, learn SVR or other regressor on this data set.
- Demo Keras
### Coding Session
- keras reuse network and play with it.
- assist to setup the workshop material on own computer.
- provide example problems if attendees don't bring own data.