course_layout.md



Targeted audience

Researchers from DBIOL and DGESS having no machine learning experience yet.
Basic Python knowledge.
Almost no math knowledge.


Concepts

3 days workshop: 2 days lectures with exercises + 0.5 day real life example walk
through + 0.5 day working on own data / prepared data.
smooth learning curve
explain fundamental concepts first, discuss  exceptions, corner cases, pitfalls late.
plotting / pandas / numpy first. Else participants might be fight with these basics
during coding sessions and will be disctracted from the actual learning goal of an
exercise.
jupyter notebooks / conda, extra notebooks with solutions.
use prepared computers in computer room, setting up personal computer during last day
if required.
exercises: empty holes to fill


Course structure

Home prep
Introductions to NumPy, Pandas and Matplotlib (plus Python, if needed).
Prep materials to send out:

Python, ca. 6h: https://siscourses.ethz.ch/python_one_day/script.html

NumPy, ca. 3h: https://siscourses.ethz.ch/python-scientific/01_numpy.html

WARN: a bit too advanced
alt, ext: http://scipy-lectures.org/intro/numpy/index.html


Pandas, ca. 1.5h: https://siscourses.ethz.ch/python-scientific/02_pandas.html

alt, ext: http://www.scipy-lectures.org/packages/statistics/index.html#data-representation-and-interaction

cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf


Matplotlib + Seaborn

ext:

http://scipy-lectures.org/intro/matplotlib/index.html
http://scipy-lectures.org/packages/statistics/index.html#more-visualization-seaborn-for-statistical-exploration


cheat sheets:

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf


Day 1
Intro and superficial overview of classifiers including quality assessment, pipelines
and hyperparams optim.
Total time: 6h (8 x uni hour (uh))

Part 0: Preparation
Time: 15 min (1/3 uh)

organizational announcements
installation/machines preparation


Part 1: General introduction
Time: 75 min (5/3 uh)


What is machine learning?

learning from examples
working with hard to understand data.
automation


What are features / samples / feature matrix?

always numerical / categorical vectors
examples: beer, movies, images, text to numerical examples


Learning problems:


unsupervised:

find structure in set of features
beers: find groups of beer types


supervised:

classification: do I like this beer?
example: draw decision tree or surface


Part 2: Supervised learning: concepts of classification
Time: 60 min (4/3 uh)
Intention: demonstrate one / two simple examples of classifiers, also introduce the
concept of decision boundary


idea of simple linear classifier: take features, produce real value ("beer score"),
use threshold to decide

simple linear classifier (linear SVM e.g.)
beer example with some weights


show code example with logistic regression for beer data, show weights, plot decision
surface


Coding session:

change given code to use a linear SVM classifier
use different data set which can not be classified well with a linear classifier
tell to transform data and run again


Part 3: Overfitting and cross-validation
Time: 60 min (4/3 uh)
Needs: simple accuracy measure.
Classifiers (regressors) have parameters / degrees of freedom.


underfitting:

linear classifier for points on a quadratic function


overfitting:

features have actual noise, or not enough information
not enough information: orchid example in 2d. elevate to 3d using another feature.
polnome of degree 5 to fit points on a line + noise
points in a circle: draw very exact boundary line


how to check underfitting / overfitting?

measure accuracy or other metric on test dataset
cross validation


Coding session:

How to do cross validation with scikit-learn
use different beer feature set with redundant feature (+)
run crossvalidation on classifier


Part 4: accuracy, F1, ROC, ...
Time: 60 min (4/3 uh)
Intention: pitfalls of simple accuracy


how to measure accuracy?

classifier accuracy:

confusion matrix metrics
pitfalls for unbalanced data sets
e.g. diagnose HIV
precision / recall
mention ROC?


exercise (pen and paper): determine precision / recall


Coding session

do cross val with multiple metrics:
evaluate linear beer classifier from latest section
fool them: give them other dataset where classifier fails.


Part 5: Pipelines and hyperparameters tuning w/ extended exercise
Time: 1.5h (2 uh)

Scikit-Learn API: recall what we have seen up to now.
preprocessing (scaler, PCA, function/column transformers)
cross validation
parameter tuning: grid search / random search.


Coding session

build SVM and LinearRegression crossval pipelines for previous examples
use PCA in pipeline for (+) to improve performance
find optimal SVM parameters
find optimal pca components number

extended: full process for best pipeline/model selection incl. preprocessing steps
selection, hyperparams tunning w/ cross-validation


Day 2
Total time: 6h (8 x uni hour (uh))

Part 6 a+b: classifiers overview (NNs & regression-based + tree-based & ensembles)
Intention: quick walk through reliable classifiers, give some background idea if
suitable, let them play with some, incl. modification of parameters.
Summary: decision graph (mind-map) from ScikitLearn, and come up with easy to understand
summary table.

Part 6a
Time: 1h (4/3 uh)

Nearest neighbours
Logistic regression
Linear + kernel SVM classifier (SVC)

demo for Radial Basis Function (RBF) kernel trick: different parameters influence on
decision line


Part 6b
Time: 1h (4/3 uh)

Decision trees
Averaging: Random forests
Boosting AdaBoost and mention Gradient Tree Boosting (hist; xgboost)
mentions

text classification: Naive Bayes for text classification
big data:

Stochastic Gradient Descent classifier,
kernel approximation transformation (explicitly approx. kernel trick)

opt, compare SVC incl. RBF vs. Random Kitchen Sinks (RBFSampler) + linear SVC
(https://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)


summary/overview


Topics to include

interoperability of results (in terms features importance, e.g. SVN w/ high deg. poly.
kernel)
some rules of thumbs: don't use kNN classifiers for 10 or more dimensions (why? paper
link)
show decision surfaces for diff classifiers (extend exercise in sec 3 using
hyperparams)


Coding session

apply SVM, Random Forests, boosting to specific examples
MNIST example


Part 7: Supervised learning: regression
Time: 1h (4/3 uh)
Intention: demonstrate one / two simple examples of regression


regression: how would I rate this movie?
example: use weighted sum, also example for linear regressor
example: fit a quadratic function


learn regressor for movie scores / salmon weight.


Part 8: Supervised learning: neuronal networks
Time: 3h (4 uh)
Intention: Introduction to neural networks and deep learning with keras


include real-life tumor example (maybe in day 3 walk-through)


overview, history


perceptron


multi layer


multi layer demo with google online tool


where neural networks work well


keras demo


Coding Session

keras reuse network and play with it.


Day 3
Total time: 6h (8 uh)

Hands-on walk-through real life example.
Assisted programming session where participants can start to work on their own
machine learning application. Assist to setup own machines. Offer some example
data sets from https://www.kaggle.com/datasets


Misc

Best practices
Rather include/repeat in relevant workshop parts/examples

visualize features: pairwise scatter, UMAP/tSNE
PCA to simplify/understand data
check balance of data set, what if not?
start with baseline classifier/regressor
augment data to introduce variance