Merge branch 'improvements_uwe' of...

Merge branch 'improvements_uwe' of sissource.ethz.ch:sis/courses/machinelearning-introduction-workshop into improvements_uwe

Merge branch 'improvements_uwe' of...
00cf6fef · schmittu · 5082bcd3 · 2ad2e499 · 00cf6fef
Commit 00cf6fef authored 5 years ago by schmittu
--- a/course_layout.md
+++ b/course_layout.md
@@ -2,49 +2,72 @@
 # Targeted audience
- Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet.
+- Researchers from DBIOL and DGESS having no machine learning experience yet.
 - Basic Python knowledge.
 - Almost no math knowledge.
 # Concepts
- two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.
+- 3 days workshop: 2 days lectures with exercises + 0.5 day real life example walk
+  through + 0.5 day working on own data / prepared data.
 - smooth learning curve
- explain fundamental concepts first, discuss  exceptions, corner cases,
+- explain fundamental concepts first, discuss  exceptions, corner cases, pitfalls late.
-  pitfalls late.
+- plotting / pandas / numpy first. Else participants might be fight with these basics
- plotting / pandas? / numpy first. Else participants might be fight with these
+  during coding sessions and will be disctracted from the actual learning goal of an
-  basics during coding sessions and will be disctracted from the actual
+  exercise.
-  learning goal of an exercise.
 - jupyter notebooks / conda, extra notebooks with solutions.
- use prepared computers in computer room, setting up personal computer during last day if required.
+- use prepared computers in computer room, setting up personal computer during last day
+  if required.
 - exercises: empty holes to fill
-TBD:
+# Course structure
+## Home prep
+Introductions to NumPy, Pandas and Matplotlib (plus Python, if needed).
-# Course structure
+Prep materials to send out:
+* Python, ca. 6h: https://siscourses.ethz.ch/python_one_day/script.html
+* NumPy, ca. 3h: https://siscourses.ethz.ch/python-scientific/01_numpy.html
+    * WARN: a bit too advanced
+    * alt, ext: http://scipy-lectures.org/intro/numpy/index.html
+* Pandas, ca. 1.5h: https://siscourses.ethz.ch/python-scientific/02_pandas.html
+    * alt, ext: http://www.scipy-lectures.org/packages/statistics/index.html#data-representation-and-interaction
+    * cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
+* Matplotlib + Seaborn
+    * ext:
+        * http://scipy-lectures.org/intro/matplotlib/index.html
+        * http://scipy-lectures.org/packages/statistics/index.html#more-visualization-seaborn-for-statistical-exploration
+    * cheat sheets:
+        * https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf
+        * https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf
-## Part 0: Preparation (UWE)
+## Day 1
- quick basics matplotlib, numpy, pandas?:
+Intro and superficial overview of classifiers including quality assessment, pipelines
+and hyperparams optim.
-TBD: installation instructions preparation.
+Total time: 6h (8 x uni hour (uh))
-TBD: prepare coding session
+### Part 0: Preparation
+Time: 15 min (1/3 uh)
+- organizational announcements
+- installation/machines preparation
+### Part 1: General introduction
-## Part 1: Introduction  (UWE)
+Time: 75 min (5/3 uh)
- What is machine learning ?
+- What is machine learning?
  - learning from examples
  - working with hard to understand data.
-  - automatation
+  - automation
+- What are features / samples / feature matrix?
- What are features / samples / feature matrix ?
  - always numerical / categorical vectors
  - examples: beer, movies, images, text to numerical examples
@@ -57,46 +80,37 @@ TBD: prepare coding session
    - supervised:
-      - classification: do I like this beer ?
+      - classification: do I like this beer?
-        example: draw decision tree
+        example: draw decision tree or surface
-## Part 2a: supervised learning: classification
-  Intention: demonstrate one / two simple examples of classifiers, also
-             introduce the concept of decision boundary
-  - idea of simple linear classifier: take features, produce real value ("uwes beer score"), use threshold to decide
+### Part 2: Supervised learning: concepts of classification
-    -> simple linear classifier (linear SVM e.g.)
-    -> beer example with some weights
-  - show code example with logistic regression for beer data, show weights, plot decision function
+Time: 60 min (4/3 uh)
-### Coding session:
+Intention: demonstrate one / two simple examples of classifiers, also introduce the
+concept of decision boundary
-  - change given code to use a linear SVM classifier
+- idea of simple linear classifier: take features, produce real value ("beer score"),
+  use threshold to decide
+  - simple linear classifier (linear SVM e.g.)
+  - beer example with some weights
-  - use different data (TBD) set which can not be classified well with a linear classifier
+- show code example with logistic regression for beer data, show weights, plot decision
-  - tell to transform data and run again (TBD: how exactly ?)
+  surface
+#### Coding session:
-## Part 2b: supervised learning: regression (TBD: skip this ?)
+- change given code to use a linear SVM classifier
+- use different data set which can not be classified well with a linear classifier
+- tell to transform data and run again
-  Intention: demonstrate one / two simple examples of regression
+### Part 3: Overfitting and cross-validation
-  - regression: how would I rate this movie ?
+Time: 60 min (4/3 uh)
-    example: use weighted sum, also example for linear regresor
-    example: fit a quadratic function
-  - learn regressor for movie scores.
+Needs: simple accuracy measure.
+Classifiers (regressors) have parameters / degrees of freedom.
-## Part 3: underfitting/overfitting
-needs: simple accuracy measure.
-classifiers / regressors have parameters / degrees of freedom.
 - underfitting:
@@ -109,122 +123,132 @@ classifiers / regressors have parameters / degrees of freedom.
  - polnome of degree 5 to fit points on a line + noise
  - points in a circle: draw very exact boundary line
- how to check underfitting / overfitting ?
+- how to check underfitting / overfitting?
  - measure accuracy or other metric on test dataset
  - cross validation
-### Coding session:
+#### Coding session:
 - How to do cross validation with scikit-learn
 - use different beer feature set with redundant feature (+)
 - run crossvalidation on classifier
- ? run crossvalidation on movie regression problem
-## Part 4: accuracy, F1, ROC, ...
+### Part 4: accuracy, F1, ROC, ...
-Intention: accuracy is usefull but has pitfalls
+Time: 60 min (4/3 uh)
- how to measure accuracy ?
+Intention: pitfalls of simple accuracy
+- how to measure accuracy?
-  - (TDB: skip ?) regression accuracy
-  -
  - classifier accuracy:
-    - confusion matrix
+    - confusion matrix metrics
-    - accurarcy
+    - pitfalls for unbalanced data sets
-    - pitfalls for unbalanced data sets~
        e.g. diagnose HIV
    - precision / recall
-    - ROC ?
+    - mention ROC?
- exercise: do cross val with other metrics
+- excercise (pen and paper): determine precision / recall
-### Coding session
+#### Coding session
- evaluate accuracy of linear beer classifier from latest section
+- do cross val with multiple metrics:
+  evaluate linear beer classifier from latest section
+- fool them: give them other dataset where classifier fails.
- determine precision / recall
+### Part 5: Pipelines and hyperparameters tuning w/ extended exercise
- fool them: give them other dataset where classifier fails.
+Time: 1.5h (2 uh)
+- Scikit-learn API:  recall what we have seen up to now.
+- preprocessing (scaler, PCA, function/column transformers)
+- cross validation
+- parameter tuning: grid search / random search.
+#### Coding session
+- build SVM and LinearRegression crossval pipelines for previous examples
+- use PCA in pipeline for (+) to improve performance
+- find optimal SVM parameters
+- find optimal pca components number
+- **extended**: full process for best pipeline/model selection incl. preprocessing steps
+  selection, hyperparams tunning w/ cross-validation
-# Day 2
+## Day 2
-## Part 5: classifiers overview
+Total time: 6h (8 x uni hour (uh))
+### Part 6 a+b: classifiers overview (NNs & regression-based + tree-based & ensembles)
 Intention: quick walk through reliable classifiers, give some background idea if
 suitable, let them play with some, incl. modification of parameters.
-To consider: decision graph from sklearn, come up with easy to understand
+Summary: decision graph (mind-map) from ScikitLearn, and come up with easy to understand
-diagram.
+summary table.
+#### Part 6a
+Time: 1h (4/3 uh)
 - Nearest neighbours
- SVM classifier (SVC)
+- Logistic regression
+- Linear + kernel SVM classifier (SVC)
  - demo for Radial Basis Function (RBF) kernel trick: different parameters influence on
    decision line
- ?Decision trees or only in random forests?
- Random forests (ensemble method - averaging)
- Gradient Tree Boosting (ensemble method - boosting)
- Naive Bayes for text classification
- mentions - big data:
-  - Stochastic Gradient Descent classifier,
-  - kernel approximation transformation (explicitly approx. kernel trick)
-    - compare SVC incl. RBF vs. Random Kitchen Sinks (RBFSampler) + linear SVC (https://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)
-Topics to include:
- interoperability of results (in terms features importance, e.g. SVN w/ hig deg poly
-  kernel)
- some rules of thumbs: don't use KNN classifiers for 10 or more dimensions (why? paper
-  link)
- show decision surfaces for diff classifiers (extend exercise in sec 3 using
-  hyperparams)
-### Coding session
+#### Part 6b
- apply SVM, Random Forests, Gradient boosting to previous examples
+Time: 1h (4/3 uh)
- apply clustering to previous examples
- MNIST example
+- Decision trees
+- Averaging: Random forests
+- Boosting AdaBoost and mention Gradient Tree Boosting (hist; xgboost)
+- mentions
+  - text classification: Naive Bayes for text classification
+  - big data:
+    - Stochastic Gradient Descent classifier,
+    - kernel approximation transformation (explicitly approx. kernel trick)
+      - opt, compare SVC incl. RBF vs. Random Kitchen Sinks (RBFSampler) + linear SVC
+        (https://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)
+- summary/overview
-## Part 6: pipelines / parameter tuning with scikit-learn
+#### Topics to include
- Scikit-learn API:  recall what we have seen up to now.
- pipelines, preprocessing (scaler, PCA)
- cross validation
- parameter tuning: grid search / random search.
-### Coding session
- build SVM and LinearRegression crossval pipelines for previous examples
- use PCA in pipeline for (+) to improve performance
- find optimal SVM parameters
- find optimal pca components number
+- interoperability of results (in terms features importance, e.g. SVN w/ high deg. poly.
+  kernel)
+- some rules of thumbs: don't use kNN classifiers for 10 or more dimensions (why? paper
+  link)
+- show decision surfaces for diff classifiers (extend exercise in sec 3 using
+  hyperparams)
-## Part 7: Start with neural networks. .5 day
+#### Coding session
-## Planning
+- apply SVM, Random Forests, boosting to specific examples
+- MNIST example
-Stop here, make time estimates.
+### Part 7: Supervised learning: regression
+Time: 1h (4/3 uh)
+Intention: demonstrate one / two simple examples of regression
+- regression: how would I rate this movie?
+  example: use weighted sum, also example for linear regressor
+  example: fit a quadratic function
+- learn regressor for movie scores / salmon weight.
+### Part 8: Supervised learning: neuronal networks
-## Part 8: Best practices
+Time: 3h (4 uh)
- visualize features: pairwise scatter, tSNE
+Intention: Introduction to neural networks and deep learning with `keras`
- PCA to undertand data
- check balance of data set, what if not ?
- start with baseline classifier / regressor
- augment data to introduce variance
-## Part 9: neural networks
+- include real-life tumor example (maybe in day 3 walk-through)
 - overview, history
 - perceptron
@@ -233,11 +257,28 @@ Stop here, make time estimates.
 - where neural networks work well
 - keras demo
-### Coding Session
+#### Coding Session
 - keras reuse network and play with it.
+## Day 3
+Total time: 6h (8 uh)
+1. Hands-on walk-through real life example.
+2. Assisted programming session where participants can start to work on their own
+   machine learning application. Assist to setup own machines. Offer some example
+   data sets from https://www.kaggle.com/datasets
+## Misc
+### Best practices
+Rather include/repeat in relevant workshop parts/examples
+- visualize features: pairwise scatter, UMAP/tSNE
+- PCA to simplify/understand data
+- check balance of data set, what if not?
+- start with baseline classifier/regressor
+- augment data to introduce variance