first version of course layout

3c164b9f · schmittu · 65cf35eb · 3c164b9f
Commit 3c164b9f authored 6 years ago by schmittu
--- a/layout.md
+++ b/layout.md
+# Targeted audience
+- Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet.
+- Basic Python knowledge.
+- Almost no math knowledge.
+# Concepts
+- smooth learning curve
+- explain fundamental concepts first, discuss  exceptions, corner cases, pitfalls late.
+- plotting / pandas / numpy first. Else we participants might be disctracted during
+  coding exercises and miss the actual learning goal of an exercise.
+# Course structure
+## Preparation
+- setup machines
+- quick basics matplotlib, numpy, pandas
+## Part 1: Introduction
+- Why machine learning ?
+- What are features ?
+  - always numerical vectors
+  - examples: beer, movies, images, text
+- unsupervised:
+  - find structure in set of features
+  - beers: find groups of beer types
+### Coding session:
+  - read dataframe from csv or excel sheet with beer features
+  - do some features vs features scatter plots
+  - use tsne to show clusters
+  - scikit-learn example to find clusters
+## Part 2: supervised learning
+- supervised:
+  - classification: do I like this beer ?
+    example: draw decision tree
+  - classifiation: points on both sides of a line, points in circle, xor problem
+    - idea of decision function: take features, produce real value, use threshold to decide
+    - simple linear classifier
+    - show some examples for feature engineering here to apply linear classifier
+  - regression: how would I rate this movie ?
+    example: use weighted sum, also example for linear regresor
+    example: fit a quadratic function
+### Coding session:
+  - show: read circle data, plot data, augment features, learn linear classifier with scikit-learn,
+    show weights and explain classifier, plot decision boundary
+    load eval data set and evaluate accuracy.
+  - adapt: read xor data, plot data, augment features, learn linear classifier with scikit-learn,
+    show weights and explain classifier, plot decision boundary
+    load eval data set and evaluate accuracy.
+  - learn regressor for movie scores.
+## Part 3: accuracy, F1, ROC, ...
+- how to measure accuracy ?
+  - regression accuracy
+  - classifier accuracy:
+    - confusion matrix
+    - pitfalls for unbalanced data sets
+        e.g. diagnose HIV
+    - precision / recall
+    - ROC ?
+### Coding session
+- evaluate accuracy of linear beer classifier
+- determine precision / recall
+- ROC curve based on threshold
+- provide predetermined weights, show ROC curve.
+## Part 4: underfitting/overfitting
+classifiers / regressors have parameters / degrees of freedom.
+- underfitting:
+  - linear classifier for points on a quadratic function
+- overfitting:
+  - features have actual noise, or not enough information
+    not enough information: orchid example in 2d. elevate to 3d using another feature.
+  - polnome of degree 5 to fit points on a line + noise
+  - points in a circle: draw very exact boundary line
+- how to check underfitting / overfitting ?
+  - measure accuracy
+  - test data set
+  - cross validation
+### Coding session:
+- How to do cross validation with scikit-learn
+- use different beer feature set with redundant feature (+)
+- run crossvalidation on classifier
+- run crossvalidation on movie regression problem
+## Part 5: Overview scikit-learn / algorithms
+- Linear regressors
+- Neighrest neighbours
+- SVMs
+  - demo for RBF: different parameters influence on decision line
+- Random forests
+- Gradient Tree Boosting
+- Clustering
+### Coding session
+- apply SVM, Random Forests, Gradient boosting to previous examples
+- apply clustering to previous examples
+- MNIST example
+## Part 6: pipelines / cross val / parameter optimiation with scikit-learn
+- Scicit learn api
+- pipelines, preprocessing (scaler, PCA)
+- cross validatioon
+- parameter optimization
+### Coding session
+- build SVM and Random forest crossval pipelines for previous examples
+- use PCA in pipeline for (+) to improve performance
+- find optimal SVM parameters
+- find optimal pca components number
+## Part 7: Best practices
+- visualize features: pairwise scatter, tSNE
+- PCA to undertand data
+- check balance of data set, what if not ?
+- start with baseline classifier / regressor
+- augment data to introduce variance
+## Part 8: neural networks
+- overview, history
+- perceptron
+- multi layer
+- multi layer demoe with google online tool
+- where neural networks work well
+- keras demo
+### Coding Session
+- keras reuse network and play with it.