From 3c164b9f34993c2f7f5d13191649097436ea5f2b Mon Sep 17 00:00:00 2001 From: Uwe Schmitt <uwe.schmitt@id.ethz.ch> Date: Fri, 7 Sep 2018 13:17:40 +0200 Subject: [PATCH] first version of course layout --- layout.md | 175 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 175 insertions(+) create mode 100644 layout.md diff --git a/layout.md b/layout.md new file mode 100644 index 0000000..2821adb --- /dev/null +++ b/layout.md @@ -0,0 +1,175 @@ +# Targeted audience + +- Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet. +- Basic Python knowledge. +- Almost no math knowledge. + +# Concepts + +- smooth learning curve +- explain fundamental concepts first, discuss exceptions, corner cases, pitfalls late. +- plotting / pandas / numpy first. Else we participants might be disctracted during + coding exercises and miss the actual learning goal of an exercise. + + +# Course structure + +## Preparation + +- setup machines + +- quick basics matplotlib, numpy, pandas + + + +## Part 1: Introduction + +- Why machine learning ? + +- What are features ? + - always numerical vectors + - examples: beer, movies, images, text + +- unsupervised: + + - find structure in set of features + - beers: find groups of beer types + +### Coding session: + + - read dataframe from csv or excel sheet with beer features + - do some features vs features scatter plots + - use tsne to show clusters + - scikit-learn example to find clusters + +## Part 2: supervised learning + +- supervised: + + - classification: do I like this beer ? + example: draw decision tree + + - classifiation: points on both sides of a line, points in circle, xor problem + - idea of decision function: take features, produce real value, use threshold to decide + - simple linear classifier + - show some examples for feature engineering here to apply linear classifier + + - regression: how would I rate this movie ? + example: use weighted sum, also example for linear regresor + example: fit a quadratic function + +### Coding session: + + - show: read circle data, plot data, augment features, learn linear classifier with scikit-learn, + show weights and explain classifier, plot decision boundary + load eval data set and evaluate accuracy. + + - adapt: read xor data, plot data, augment features, learn linear classifier with scikit-learn, + show weights and explain classifier, plot decision boundary + load eval data set and evaluate accuracy. + + - learn regressor for movie scores. + + +## Part 3: accuracy, F1, ROC, ... + +- how to measure accuracy ? + - regression accuracy + - classifier accuracy: + - confusion matrix + - pitfalls for unbalanced data sets + e.g. diagnose HIV + - precision / recall + - ROC ? + +### Coding session + +- evaluate accuracy of linear beer classifier + +- determine precision / recall + +- ROC curve based on threshold + +- provide predetermined weights, show ROC curve. + + +## Part 4: underfitting/overfitting + +classifiers / regressors have parameters / degrees of freedom. + +- underfitting: + + - linear classifier for points on a quadratic function + +- overfitting: + + - features have actual noise, or not enough information + not enough information: orchid example in 2d. elevate to 3d using another feature. + - polnome of degree 5 to fit points on a line + noise + - points in a circle: draw very exact boundary line + +- how to check underfitting / overfitting ? + + - measure accuracy + - test data set + - cross validation + + +### Coding session: + +- How to do cross validation with scikit-learn +- use different beer feature set with redundant feature (+) +- run crossvalidation on classifier +- run crossvalidation on movie regression problem + + +## Part 5: Overview scikit-learn / algorithms + +- Linear regressors +- Neighrest neighbours +- SVMs + - demo for RBF: different parameters influence on decision line +- Random forests +- Gradient Tree Boosting +- Clustering + +### Coding session + +- apply SVM, Random Forests, Gradient boosting to previous examples +- apply clustering to previous examples +- MNIST example + +## Part 6: pipelines / cross val / parameter optimiation with scikit-learn + +- Scicit learn api +- pipelines, preprocessing (scaler, PCA) +- cross validatioon +- parameter optimization + +### Coding session + +- build SVM and Random forest crossval pipelines for previous examples +- use PCA in pipeline for (+) to improve performance +- find optimal SVM parameters +- find optimal pca components number + +## Part 7: Best practices + +- visualize features: pairwise scatter, tSNE +- PCA to undertand data +- check balance of data set, what if not ? +- start with baseline classifier / regressor +- augment data to introduce variance + +## Part 8: neural networks + +- overview, history +- perceptron +- multi layer +- multi layer demoe with google online tool +- where neural networks work well +- keras demo + +### Coding Session + +- keras reuse network and play with it. -- GitLab