Skip to content
Snippets Groups Projects
Commit 00cf6fef authored by schmittu's avatar schmittu :beer:
Browse files

Merge branch 'improvements_uwe' of...

Merge branch 'improvements_uwe' of sissource.ethz.ch:sis/courses/machinelearning-introduction-workshop into improvements_uwe
parents 5082bcd3 2ad2e499
No related branches found
No related tags found
No related merge requests found
...@@ -2,49 +2,72 @@ ...@@ -2,49 +2,72 @@
# Targeted audience # Targeted audience
- Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet. - Researchers from DBIOL and DGESS having no machine learning experience yet.
- Basic Python knowledge. - Basic Python knowledge.
- Almost no math knowledge. - Almost no math knowledge.
# Concepts # Concepts
- two days workshop, 1.5 days workshop + .5 day working on own data / prepared data. - 3 days workshop: 2 days lectures with exercises + 0.5 day real life example walk
through + 0.5 day working on own data / prepared data.
- smooth learning curve - smooth learning curve
- explain fundamental concepts first, discuss exceptions, corner cases, - explain fundamental concepts first, discuss exceptions, corner cases, pitfalls late.
pitfalls late. - plotting / pandas / numpy first. Else participants might be fight with these basics
- plotting / pandas? / numpy first. Else participants might be fight with these during coding sessions and will be disctracted from the actual learning goal of an
basics during coding sessions and will be disctracted from the actual exercise.
learning goal of an exercise.
- jupyter notebooks / conda, extra notebooks with solutions. - jupyter notebooks / conda, extra notebooks with solutions.
- use prepared computers in computer room, setting up personal computer during last day if required. - use prepared computers in computer room, setting up personal computer during last day
if required.
- exercises: empty holes to fill - exercises: empty holes to fill
TBD: # Course structure
## Home prep
Introductions to NumPy, Pandas and Matplotlib (plus Python, if needed).
# Course structure Prep materials to send out:
* Python, ca. 6h: https://siscourses.ethz.ch/python_one_day/script.html
* NumPy, ca. 3h: https://siscourses.ethz.ch/python-scientific/01_numpy.html
* WARN: a bit too advanced
* alt, ext: http://scipy-lectures.org/intro/numpy/index.html
* Pandas, ca. 1.5h: https://siscourses.ethz.ch/python-scientific/02_pandas.html
* alt, ext: http://www.scipy-lectures.org/packages/statistics/index.html#data-representation-and-interaction
* cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
* Matplotlib + Seaborn
* ext:
* http://scipy-lectures.org/intro/matplotlib/index.html
* http://scipy-lectures.org/packages/statistics/index.html#more-visualization-seaborn-for-statistical-exploration
* cheat sheets:
* https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf
* https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf
## Part 0: Preparation (UWE) ## Day 1
- quick basics matplotlib, numpy, pandas?: Intro and superficial overview of classifiers including quality assessment, pipelines
and hyperparams optim.
TBD: installation instructions preparation. Total time: 6h (8 x uni hour (uh))
TBD: prepare coding session ### Part 0: Preparation
Time: 15 min (1/3 uh)
- organizational announcements
- installation/machines preparation
### Part 1: General introduction
## Part 1: Introduction (UWE) Time: 75 min (5/3 uh)
- What is machine learning ? - What is machine learning?
- learning from examples - learning from examples
- working with hard to understand data. - working with hard to understand data.
- automatation - automation
- What are features / samples / feature matrix?
- What are features / samples / feature matrix ?
- always numerical / categorical vectors - always numerical / categorical vectors
- examples: beer, movies, images, text to numerical examples - examples: beer, movies, images, text to numerical examples
...@@ -57,46 +80,37 @@ TBD: prepare coding session ...@@ -57,46 +80,37 @@ TBD: prepare coding session
- supervised: - supervised:
- classification: do I like this beer ? - classification: do I like this beer?
example: draw decision tree example: draw decision tree or surface
## Part 2a: supervised learning: classification
Intention: demonstrate one / two simple examples of classifiers, also
introduce the concept of decision boundary
- idea of simple linear classifier: take features, produce real value ("uwes beer score"), use threshold to decide ### Part 2: Supervised learning: concepts of classification
-> simple linear classifier (linear SVM e.g.)
-> beer example with some weights
- show code example with logistic regression for beer data, show weights, plot decision function Time: 60 min (4/3 uh)
### Coding session: Intention: demonstrate one / two simple examples of classifiers, also introduce the
concept of decision boundary
- change given code to use a linear SVM classifier - idea of simple linear classifier: take features, produce real value ("beer score"),
use threshold to decide
- simple linear classifier (linear SVM e.g.)
- beer example with some weights
- use different data (TBD) set which can not be classified well with a linear classifier - show code example with logistic regression for beer data, show weights, plot decision
- tell to transform data and run again (TBD: how exactly ?) surface
#### Coding session:
## Part 2b: supervised learning: regression (TBD: skip this ?) - change given code to use a linear SVM classifier
- use different data set which can not be classified well with a linear classifier
- tell to transform data and run again
Intention: demonstrate one / two simple examples of regression ### Part 3: Overfitting and cross-validation
- regression: how would I rate this movie ? Time: 60 min (4/3 uh)
example: use weighted sum, also example for linear regresor
example: fit a quadratic function
- learn regressor for movie scores. Needs: simple accuracy measure.
Classifiers (regressors) have parameters / degrees of freedom.
## Part 3: underfitting/overfitting
needs: simple accuracy measure.
classifiers / regressors have parameters / degrees of freedom.
- underfitting: - underfitting:
...@@ -109,122 +123,132 @@ classifiers / regressors have parameters / degrees of freedom. ...@@ -109,122 +123,132 @@ classifiers / regressors have parameters / degrees of freedom.
- polnome of degree 5 to fit points on a line + noise - polnome of degree 5 to fit points on a line + noise
- points in a circle: draw very exact boundary line - points in a circle: draw very exact boundary line
- how to check underfitting / overfitting ? - how to check underfitting / overfitting?
- measure accuracy or other metric on test dataset - measure accuracy or other metric on test dataset
- cross validation - cross validation
### Coding session: #### Coding session:
- How to do cross validation with scikit-learn - How to do cross validation with scikit-learn
- use different beer feature set with redundant feature (+) - use different beer feature set with redundant feature (+)
- run crossvalidation on classifier - run crossvalidation on classifier
- ? run crossvalidation on movie regression problem
## Part 4: accuracy, F1, ROC, ... ### Part 4: accuracy, F1, ROC, ...
Intention: accuracy is usefull but has pitfalls Time: 60 min (4/3 uh)
- how to measure accuracy ? Intention: pitfalls of simple accuracy
- how to measure accuracy?
- (TDB: skip ?) regression accuracy
-
- classifier accuracy: - classifier accuracy:
- confusion matrix - confusion matrix metrics
- accurarcy - pitfalls for unbalanced data sets
- pitfalls for unbalanced data sets~
e.g. diagnose HIV e.g. diagnose HIV
- precision / recall - precision / recall
- ROC ? - mention ROC?
- exercise: do cross val with other metrics - excercise (pen and paper): determine precision / recall
### Coding session #### Coding session
- evaluate accuracy of linear beer classifier from latest section - do cross val with multiple metrics:
evaluate linear beer classifier from latest section
- fool them: give them other dataset where classifier fails.
- determine precision / recall ### Part 5: Pipelines and hyperparameters tuning w/ extended exercise
- fool them: give them other dataset where classifier fails. Time: 1.5h (2 uh)
- Scikit-learn API: recall what we have seen up to now.
- preprocessing (scaler, PCA, function/column transformers)
- cross validation
- parameter tuning: grid search / random search.
#### Coding session
- build SVM and LinearRegression crossval pipelines for previous examples
- use PCA in pipeline for (+) to improve performance
- find optimal SVM parameters
- find optimal pca components number
- **extended**: full process for best pipeline/model selection incl. preprocessing steps
selection, hyperparams tunning w/ cross-validation
# Day 2 ## Day 2
## Part 5: classifiers overview Total time: 6h (8 x uni hour (uh))
### Part 6 a+b: classifiers overview (NNs & regression-based + tree-based & ensembles)
Intention: quick walk through reliable classifiers, give some background idea if Intention: quick walk through reliable classifiers, give some background idea if
suitable, let them play with some, incl. modification of parameters. suitable, let them play with some, incl. modification of parameters.
To consider: decision graph from sklearn, come up with easy to understand Summary: decision graph (mind-map) from ScikitLearn, and come up with easy to understand
diagram. summary table.
#### Part 6a
Time: 1h (4/3 uh)
- Nearest neighbours - Nearest neighbours
- SVM classifier (SVC) - Logistic regression
- Linear + kernel SVM classifier (SVC)
- demo for Radial Basis Function (RBF) kernel trick: different parameters influence on - demo for Radial Basis Function (RBF) kernel trick: different parameters influence on
decision line decision line
- ?Decision trees or only in random forests?
- Random forests (ensemble method - averaging)
- Gradient Tree Boosting (ensemble method - boosting)
- Naive Bayes for text classification
- mentions - big data:
- Stochastic Gradient Descent classifier,
- kernel approximation transformation (explicitly approx. kernel trick)
- compare SVC incl. RBF vs. Random Kitchen Sinks (RBFSampler) + linear SVC (https://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)
Topics to include:
- interoperability of results (in terms features importance, e.g. SVN w/ hig deg poly
kernel)
- some rules of thumbs: don't use KNN classifiers for 10 or more dimensions (why? paper
link)
- show decision surfaces for diff classifiers (extend exercise in sec 3 using
hyperparams)
### Coding session #### Part 6b
- apply SVM, Random Forests, Gradient boosting to previous examples Time: 1h (4/3 uh)
- apply clustering to previous examples
- MNIST example
- Decision trees
- Averaging: Random forests
- Boosting AdaBoost and mention Gradient Tree Boosting (hist; xgboost)
- mentions
- text classification: Naive Bayes for text classification
- big data:
- Stochastic Gradient Descent classifier,
- kernel approximation transformation (explicitly approx. kernel trick)
- opt, compare SVC incl. RBF vs. Random Kitchen Sinks (RBFSampler) + linear SVC
(https://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)
- summary/overview
## Part 6: pipelines / parameter tuning with scikit-learn #### Topics to include
- Scikit-learn API: recall what we have seen up to now.
- pipelines, preprocessing (scaler, PCA)
- cross validation
- parameter tuning: grid search / random search.
### Coding session
- build SVM and LinearRegression crossval pipelines for previous examples
- use PCA in pipeline for (+) to improve performance
- find optimal SVM parameters
- find optimal pca components number
- interoperability of results (in terms features importance, e.g. SVN w/ high deg. poly.
kernel)
- some rules of thumbs: don't use kNN classifiers for 10 or more dimensions (why? paper
link)
- show decision surfaces for diff classifiers (extend exercise in sec 3 using
hyperparams)
## Part 7: Start with neural networks. .5 day #### Coding session
## Planning - apply SVM, Random Forests, boosting to specific examples
- MNIST example
Stop here, make time estimates. ### Part 7: Supervised learning: regression
Time: 1h (4/3 uh)
Intention: demonstrate one / two simple examples of regression
- regression: how would I rate this movie?
example: use weighted sum, also example for linear regressor
example: fit a quadratic function
- learn regressor for movie scores / salmon weight.
### Part 8: Supervised learning: neuronal networks
## Part 8: Best practices Time: 3h (4 uh)
- visualize features: pairwise scatter, tSNE Intention: Introduction to neural networks and deep learning with `keras`
- PCA to undertand data
- check balance of data set, what if not ?
- start with baseline classifier / regressor
- augment data to introduce variance
## Part 9: neural networks - include real-life tumor example (maybe in day 3 walk-through)
- overview, history - overview, history
- perceptron - perceptron
...@@ -233,11 +257,28 @@ Stop here, make time estimates. ...@@ -233,11 +257,28 @@ Stop here, make time estimates.
- where neural networks work well - where neural networks work well
- keras demo - keras demo
### Coding Session #### Coding Session
- keras reuse network and play with it. - keras reuse network and play with it.
## Day 3
Total time: 6h (8 uh)
1. Hands-on walk-through real life example.
2. Assisted programming session where participants can start to work on their own
machine learning application. Assist to setup own machines. Offer some example
data sets from https://www.kaggle.com/datasets
## Misc
### Best practices
Rather include/repeat in relevant workshop parts/examples
- visualize features: pairwise scatter, UMAP/tSNE
- PCA to simplify/understand data
- check balance of data set, what if not?
- start with baseline classifier/regressor
- augment data to introduce variance
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment