course_layout.md



Targeted audience

Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet.
Basic Python knowledge.
Almost no math knowledge.


Concepts

two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.
smooth learning curve
explain fundamental concepts first, discuss  exceptions, corner cases,
pitfalls late.
plotting / pandas? / numpy first. Else participants might be fight with these
basics during coding sessions and will be disctracted from the actual
learning goal of an exercise.
jupyter notebooks / conda, extra notebooks with solutions.
use prepared computers in computer room, setting up personal computer during last day if required.
exercises: empty holes to fill

TBD:

Course structure

Part 0: Preparation (UWE)

quick basics matplotlib, numpy, pandas?:

TBD: installation instructions preparation.
TBD: prepare coding session

Part 1: Introduction  (UWE)


What is machine learning ?

learning from examples
working with hard to understand data.
automatation


What are features / samples / feature matrix ?

always numerical / categorical vectors
examples: beer, movies, images, text to numerical examples


Learning problems:


unsupervised:

find structure in set of features
beers: find groups of beer types


supervised:

classification: do I like this beer ?
example: draw decision tree


Part 2a: supervised learning: classification
Intention: demonstrate one / two simple examples of classifiers, also
introduce the concept of decision boundary


idea of simple linear classifier: take features, produce real value ("uwes beer score"), use threshold to decide
-> simple linear classifier (linear SVM e.g.)
-> beer example with some weights


show code example with logistic regression for beer data, show weights, plot decision function


Coding session:


change given code to use a linear SVM classifier


use different data (TBD) set which can not be classified well with a linear classifier


tell to transform data and run again (TBD: how exactly ?)


Part 2b: supervised learning: regression (TBD: skip this ?)
Intention: demonstrate one / two simple examples of regression


regression: how would I rate this movie ?
example: use weighted sum, also example for linear regresor
example: fit a quadratic function


learn regressor for movie scores.


Part 3: underfitting/overfitting
needs: simple accuracy measure.
classifiers / regressors have parameters / degrees of freedom.


underfitting:

linear classifier for points on a quadratic function


overfitting:

features have actual noise, or not enough information
not enough information: orchid example in 2d. elevate to 3d using another feature.
polnome of degree 5 to fit points on a line + noise
points in a circle: draw very exact boundary line


how to check underfitting / overfitting ?

measure accuracy or other metric on test dataset
cross validation


Coding session:

How to do cross validation with scikit-learn
use different beer feature set with redundant feature (+)
run crossvalidation on classifier
? run crossvalidation on movie regression problem


Part 4: accuracy, F1, ROC, ...
Intention: accuracy is usefull but has pitfalls


how to measure accuracy ?

(TDB: skip ?) regression accuracy


classifier accuracy:

confusion matrix
accurarcy
pitfalls for unbalanced data sets~
e.g. diagnose HIV
precision / recall
ROC ?


exercise: do cross val with other metrics


Coding session


evaluate accuracy of linear beer classifier from latest section


determine precision / recall


fool them: give them other dataset where classifier fails.


Day 2

Part 5: pipelines / parameter tuning with scikit-learn

Scicit learn api:  recall what we have seen up to now.
pipelines, preprocessing (scaler, PCA)
cross validatioon
parameter tuning: grid search / random search.


Coding session

build SVM and Random forest crossval pipelines for previous examples
use PCA in pipeline for (+) to improve performance
find optimal SVM parameters
find optimal pca components number


Coding par
Planning: stop here, make time estimates.

Part 6: classifiers overview
Intention: quick walk throught throug reliable classifiers, give some background
idea if suitable, let them play withs some incl. modification of parameters.
to consider: decision graph from sklearn, come up with easy to understand
diagram.

Neighrest neighbours
SVMs

demo for RBF: different parameters influence on decision line


Random forests
Gradient Tree Boosting

show decision surfaces of these classifiers on 2d examples.

Coding session

apply SVM, Random Forests, Gradient boosting to previous examples
apply clustering to previous examples
MNIST example


Part 7: Start with neural networks. .5 day

Part 8: Best practices

visualize features: pairwise scatter, tSNE
PCA to undertand data
check balance of data set, what if not ?
start with baseline classifier / regressor
augment data to introduce variance


Part 9: neural networks

overview, history
perceptron
multi layer
multi layer demoe with google online tool
where neural networks work well
keras demo


Coding Session

keras reuse network and play with it.