- Targeted audience
- Concepts
- Course structure
- Part 0: Preparation (UWE)
- Part 1: Introduction (UWE)
- Part 2a: supervised learning: classification
- Coding session:
- Part 2b: supervised learning: regression (TBD: skip this ?)
- Part 3: underfitting/overfitting
- Coding session:
- Part 4: accuracy, F1, ROC, ...
- Coding session
- Day 2
- Part 5: pipelines / parameter tuning with scikit-learn
- Coding session
- Coding par
- Part 6: classifiers overview
- Coding session
- Part 7: Start with neural networks. .5 day
- Part 8: Best practices
- Part 9: neural networks
- Coding Session
Targeted audience
- Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet.
- Basic Python knowledge.
- Almost no math knowledge.
Concepts
- two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.
- smooth learning curve
- explain fundamental concepts first, discuss exceptions, corner cases, pitfalls late.
- plotting / pandas? / numpy first. Else participants might be fight with these basics during coding sessions and will be disctracted from the actual learning goal of an exercise.
- jupyter notebooks / conda, extra notebooks with solutions.
- use prepared computers in computer room, setting up personal computer during last day if required.
- exercises: empty holes to fill
TBD:
Course structure
Part 0: Preparation (UWE)
- quick basics matplotlib, numpy, pandas?:
TBD: installation instructions preparation.
TBD: prepare coding session
Part 1: Introduction (UWE)
-
What is machine learning ?
- learning from examples
- working with hard to understand data.
- automatation
-
What are features / samples / feature matrix ?
- always numerical / categorical vectors
- examples: beer, movies, images, text to numerical examples
-
Learning problems:
-
unsupervised:
- find structure in set of features
- beers: find groups of beer types
-
supervised:
- classification: do I like this beer ? example: draw decision tree
-
Part 2a: supervised learning: classification
Intention: demonstrate one / two simple examples of classifiers, also introduce the concept of decision boundary
-
idea of simple linear classifier: take features, produce real value ("uwes beer score"), use threshold to decide -> simple linear classifier (linear SVM e.g.) -> beer example with some weights
-
show code example with logistic regression for beer data, show weights, plot decision function
Coding session:
-
change given code to use a linear SVM classifier
-
use different data (TBD) set which can not be classified well with a linear classifier
-
tell to transform data and run again (TBD: how exactly ?)
Part 2b: supervised learning: regression (TBD: skip this ?)
Intention: demonstrate one / two simple examples of regression
-
regression: how would I rate this movie ? example: use weighted sum, also example for linear regresor example: fit a quadratic function
-
learn regressor for movie scores.
Part 3: underfitting/overfitting
needs: simple accuracy measure.
classifiers / regressors have parameters / degrees of freedom.
-
underfitting:
- linear classifier for points on a quadratic function
-
overfitting:
- features have actual noise, or not enough information not enough information: orchid example in 2d. elevate to 3d using another feature.
- polnome of degree 5 to fit points on a line + noise
- points in a circle: draw very exact boundary line
-
how to check underfitting / overfitting ?
- measure accuracy or other metric on test dataset
- cross validation
Coding session:
- How to do cross validation with scikit-learn
- use different beer feature set with redundant feature (+)
- run crossvalidation on classifier
- ? run crossvalidation on movie regression problem
Part 4: accuracy, F1, ROC, ...
Intention: accuracy is usefull but has pitfalls
-
how to measure accuracy ?
- (TDB: skip ?) regression accuracy
- classifier accuracy:
- confusion matrix
- accurarcy
- pitfalls for unbalanced data sets~ e.g. diagnose HIV
- precision / recall
- ROC ?
-
exercise: do cross val with other metrics
Coding session
-
evaluate accuracy of linear beer classifier from latest section
-
determine precision / recall
-
fool them: give them other dataset where classifier fails.
Day 2
Part 5: pipelines / parameter tuning with scikit-learn
- Scicit learn api: recall what we have seen up to now.
- pipelines, preprocessing (scaler, PCA)
- cross validatioon
- parameter tuning: grid search / random search.
Coding session
- build SVM and Random forest crossval pipelines for previous examples
- use PCA in pipeline for (+) to improve performance
- find optimal SVM parameters
- find optimal pca components number
Coding par
Planning: stop here, make time estimates.
Part 6: classifiers overview
Intention: quick walk throught throug reliable classifiers, give some background idea if suitable, let them play withs some incl. modification of parameters.
to consider: decision graph from sklearn, come up with easy to understand diagram.
- Neighrest neighbours
- SVMs
- demo for RBF: different parameters influence on decision line
- Random forests
- Gradient Tree Boosting
show decision surfaces of these classifiers on 2d examples.
Coding session
- apply SVM, Random Forests, Gradient boosting to previous examples
- apply clustering to previous examples
- MNIST example
Part 7: Start with neural networks. .5 day
Part 8: Best practices
- visualize features: pairwise scatter, tSNE
- PCA to undertand data
- check balance of data set, what if not ?
- start with baseline classifier / regressor
- augment data to introduce variance
Part 9: neural networks
- overview, history
- perceptron
- multi layer
- multi layer demoe with google online tool
- where neural networks work well
- keras demo
Coding Session
- keras reuse network and play with it.