Merge branch 'master' into improvements_uwe

2cc7260d · Mikolaj Rybinski · 813c593e · 22895115 · 2cc7260d
Commit 2cc7260d authored 5 years ago by Mikolaj Rybinski
--- a/course_layout.md
+++ b/course_layout.md
@@ -2,49 +2,72 @@

 # Targeted audience

- Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet.
+- Researchers from DBIOL and DGESS having no machine learning experience yet.
 - Basic Python knowledge.
 - Almost no math knowledge.

 # Concepts

- two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.
+- 3 days workshop: 2 days lectures with exercises + 0.5 day real life example walk
+  through + 0.5 day working on own data / prepared data.
 - smooth learning curve
- explain fundamental concepts first, discuss  exceptions, corner cases,
-  pitfalls late.
- plotting / pandas? / numpy first. Else participants might be fight with these
-  basics during coding sessions and will be disctracted from the actual
-  learning goal of an exercise.
+- explain fundamental concepts first, discuss  exceptions, corner cases, pitfalls late.
+- plotting / pandas / numpy first. Else participants might be fight with these basics
+  during coding sessions and will be disctracted from the actual learning goal of an
+  exercise.
 - jupyter notebooks / conda, extra notebooks with solutions.
- use prepared computers in computer room, setting up personal computer during last day if required.
+- use prepared computers in computer room, setting up personal computer during last day
+  if required.
 - exercises: empty holes to fill

-TBD:
+# Course structure

+## Home prep

+Introductions to NumPy, Pandas and Matplotlib (plus Python, if needed).

-# Course structure
+Prep materials to send out:
+* Python, ca. 6h: https://siscourses.ethz.ch/python_one_day/script.html
+* NumPy, ca. 3h: https://siscourses.ethz.ch/python-scientific/01_numpy.html
+    * WARN: a bit too advanced
+    * alt, ext: http://scipy-lectures.org/intro/numpy/index.html
+* Pandas, ca. 1.5h: https://siscourses.ethz.ch/python-scientific/02_pandas.html
+    * alt, ext: http://www.scipy-lectures.org/packages/statistics/index.html#data-representation-and-interaction
+    * cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
+* Matplotlib + Seaborn
+    * ext:
+        * http://scipy-lectures.org/intro/matplotlib/index.html
+        * http://scipy-lectures.org/packages/statistics/index.html#more-visualization-seaborn-for-statistical-exploration
+    * cheat sheets:
+        * https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf
+        * https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf

-## Part 0: Preparation (UWE)
+## Day 1

- quick basics matplotlib, numpy, pandas?:
+Intro and superficial overview of classifiers including quality assessment, pipelines
+and hyperparams optim.

-TBD: installation instructions preparation.
+Total time: 6h (8 x uni hour (uh))

-TBD: prepare coding session
+### Part 0: Preparation

+Time: 15 min (1/3 uh)

+- organizational announcements
+- installation/machines preparation

+### Part 1: General introduction

-## Part 1: Introduction  (UWE)
+Time: 75 min (5/3 uh)

- What is machine learning ?
+- What is machine learning?

  - learning from examples
  - working with hard to understand data.
-  - automatation
+  - automation
+
+- What are features / samples / feature matrix?

- What are features / samples / feature matrix ?
  - always numerical / categorical vectors
  - examples: beer, movies, images, text to numerical examples

@@ -57,46 +80,37 @@ TBD: prepare coding session

    - supervised:

-      - classification: do I like this beer ?
-        example: draw decision tree
-
-
-
-## Part 2a: supervised learning: classification
-
-  Intention: demonstrate one / two simple examples of classifiers, also
-             introduce the concept of decision boundary
+      - classification: do I like this beer?
+        example: draw decision tree or surface

-  - idea of simple linear classifier: take features, produce real value ("uwes beer score"), use threshold to decide
-    -> simple linear classifier (linear SVM e.g.)
-    -> beer example with some weights
+### Part 2: Supervised learning: concepts of classification

-  - show code example with logistic regression for beer data, show weights, plot decision function
+Time: 60 min (4/3 uh)

-### Coding session:
+Intention: demonstrate one / two simple examples of classifiers, also introduce the
+concept of decision boundary

-  - change given code to use a linear SVM classifier
+- idea of simple linear classifier: take features, produce real value ("beer score"),
+  use threshold to decide
+  - simple linear classifier (linear SVM e.g.)
+  - beer example with some weights

-  - use different data (TBD) set which can not be classified well with a linear classifier
-  - tell to transform data and run again (TBD: how exactly ?)
+- show code example with logistic regression for beer data, show weights, plot decision
+  surface

+#### Coding session:

-## Part 2b: supervised learning: regression (TBD: skip this ?)
+- change given code to use a linear SVM classifier
+- use different data set which can not be classified well with a linear classifier
+- tell to transform data and run again

-  Intention: demonstrate one / two simple examples of regression
+### Part 3: Overfitting and cross-validation

-  - regression: how would I rate this movie ?
-    example: use weighted sum, also example for linear regresor
-    example: fit a quadratic function
+Time: 60 min (4/3 uh)

-  - learn regressor for movie scores.
+Needs: simple accuracy measure.

-
-## Part 3: underfitting/overfitting
-
-needs: simple accuracy measure.
-
-classifiers / regressors have parameters / degrees of freedom.
+Classifiers (regressors) have parameters / degrees of freedom.

 - underfitting:

@@ -109,122 +123,132 @@ classifiers / regressors have parameters / degrees of freedom.
  - polnome of degree 5 to fit points on a line + noise
  - points in a circle: draw very exact boundary line

- how to check underfitting / overfitting ?
+- how to check underfitting / overfitting?

  - measure accuracy or other metric on test dataset
  - cross validation


-### Coding session:
+#### Coding session:

 - How to do cross validation with scikit-learn
 - use different beer feature set with redundant feature (+)
 - run crossvalidation on classifier
- ? run crossvalidation on movie regression problem


-## Part 4: accuracy, F1, ROC, ...
+### Part 4: accuracy, F1, ROC, ...

-Intention: accuracy is usefull but has pitfalls
+Time: 60 min (4/3 uh)

- how to measure accuracy ?
+Intention: pitfalls of simple accuracy
+
+- how to measure accuracy?

-  - (TDB: skip ?) regression accuracy
-  -
  - classifier accuracy:
-    - confusion matrix
-    - accurarcy
-    - pitfalls for unbalanced data sets~
+    - confusion matrix metrics
+    - pitfalls for unbalanced data sets
        e.g. diagnose HIV
    - precision / recall
-    - ROC ?
+    - mention ROC?

- exercise: do cross val with other metrics
+- excercise (pen and paper): determine precision / recall

-### Coding session
+#### Coding session

- evaluate accuracy of linear beer classifier from latest section
+- do cross val with multiple metrics:
+  evaluate linear beer classifier from latest section
+- fool them: give them other dataset where classifier fails.

- determine precision / recall
+### Part 5: Pipelines and hyperparameters tuning w/ extended exercise

- fool them: give them other dataset where classifier fails.
+Time: 1.5h (2 uh)

+- Scikit-learn API:  recall what we have seen up to now.
+- preprocessing (scaler, PCA, function/column transformers)
+- cross validation
+- parameter tuning: grid search / random search.
+
+#### Coding session
+
+- build SVM and LinearRegression crossval pipelines for previous examples
+- use PCA in pipeline for (+) to improve performance
+- find optimal SVM parameters
+- find optimal pca components number
+- **extended**: full process for best pipeline/model selection incl. preprocessing steps
+  selection, hyperparams tunning w/ cross-validation

-# Day 2
+## Day 2

-## Part 5: classifiers overview
+Total time: 6h (8 x uni hour (uh))
+
+### Part 6 a+b: classifiers overview (NNs & regression-based + tree-based & ensembles)

 Intention: quick walk through reliable classifiers, give some background idea if
 suitable, let them play with some, incl. modification of parameters.

-To consider: decision graph from sklearn, come up with easy to understand
-diagram.
+Summary: decision graph (mind-map) from ScikitLearn, and come up with easy to understand
+summary table.
+
+#### Part 6a
+
+Time: 1h (4/3 uh)

 - Nearest neighbours
- SVM classifier (SVC)
+- Logistic regression
+- Linear + kernel SVM classifier (SVC)
  - demo for Radial Basis Function (RBF) kernel trick: different parameters influence on
    decision line
- ?Decision trees or only in random forests?
- Random forests (ensemble method - averaging)
- Gradient Tree Boosting (ensemble method - boosting)
- Naive Bayes for text classification
- mentions - big data:
-  - Stochastic Gradient Descent classifier,
-  - kernel approximation transformation (explicitly approx. kernel trick)
-    - compare SVC incl. RBF vs. Random Kitchen Sinks (RBFSampler) + linear SVC (https://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)
-
-Topics to include:
-
- interoperability of results (in terms features importance, e.g. SVN w/ hig deg poly
-  kernel)
- some rules of thumbs: don't use KNN classifiers for 10 or more dimensions (why? paper
-  link)
- show decision surfaces for diff classifiers (extend exercise in sec 3 using
-  hyperparams)

-### Coding session
+#### Part 6b

- apply SVM, Random Forests, Gradient boosting to previous examples
- apply clustering to previous examples
- MNIST example
+Time: 1h (4/3 uh)

+- Decision trees
+- Averaging: Random forests
+- Boosting AdaBoost and mention Gradient Tree Boosting (hist; xgboost)
+- mentions
+  - text classification: Naive Bayes for text classification
+  - big data:
+    - Stochastic Gradient Descent classifier,
+    - kernel approximation transformation (explicitly approx. kernel trick)
+      - opt, compare SVC incl. RBF vs. Random Kitchen Sinks (RBFSampler) + linear SVC
+        (https://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)
+- summary/overview

-## Part 6: pipelines / parameter tuning with scikit-learn
-
- Scikit-learn API:  recall what we have seen up to now.
- pipelines, preprocessing (scaler, PCA)
- cross validation
- parameter tuning: grid search / random search.
-
-### Coding session
-
- build SVM and LinearRegression crossval pipelines for previous examples
- use PCA in pipeline for (+) to improve performance
- find optimal SVM parameters
- find optimal pca components number
+#### Topics to include

+- interoperability of results (in terms features importance, e.g. SVN w/ high deg. poly.
+  kernel)
+- some rules of thumbs: don't use kNN classifiers for 10 or more dimensions (why? paper
+  link)
+- show decision surfaces for diff classifiers (extend exercise in sec 3 using
+  hyperparams)

-## Part 7: Start with neural networks. .5 day
+#### Coding session

-## Planning
+- apply SVM, Random Forests, boosting to specific examples
+- MNIST example

-Stop here, make time estimates.
+### Part 7: Supervised learning: regression

+Time: 1h (4/3 uh)

+Intention: demonstrate one / two simple examples of regression

+- regression: how would I rate this movie?
+  example: use weighted sum, also example for linear regressor
+  example: fit a quadratic function

+- learn regressor for movie scores / salmon weight.


+### Part 8: Supervised learning: neuronal networks

-## Part 8: Best practices
+Time: 3h (4 uh)

- visualize features: pairwise scatter, tSNE
- PCA to undertand data
- check balance of data set, what if not ?
- start with baseline classifier / regressor
- augment data to introduce variance
+Intention: Introduction to neural networks and deep learning with `keras`

-## Part 9: neural networks
+- include real-life tumor example (maybe in day 3 walk-through)

 - overview, history
 - perceptron
@@ -233,11 +257,28 @@ Stop here, make time estimates.
 - where neural networks work well
 - keras demo

-### Coding Session
+#### Coding Session

 - keras reuse network and play with it.

+## Day 3
+
+Total time: 6h (8 uh)

+1. Hands-on walk-through real life example.
+2. Assisted programming session where participants can start to work on their own
+   machine learning application. Assist to setup own machines. Offer some example
+   data sets from https://www.kaggle.com/datasets


+## Misc

+### Best practices
+
+Rather include/repeat in relevant workshop parts/examples
+
+- visualize features: pairwise scatter, UMAP/tSNE
+- PCA to simplify/understand data
+- check balance of data set, what if not?
+- start with baseline classifier/regressor
+- augment data to introduce variance