Skip to content
Snippets Groups Projects
Commit 3c164b9f authored by schmittu's avatar schmittu :beer:
Browse files

first version of course layout

parent 65cf35eb
No related branches found
No related tags found
No related merge requests found
layout.md 0 → 100644
# Targeted audience
- Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet.
- Basic Python knowledge.
- Almost no math knowledge.
# Concepts
- smooth learning curve
- explain fundamental concepts first, discuss exceptions, corner cases, pitfalls late.
- plotting / pandas / numpy first. Else we participants might be disctracted during
coding exercises and miss the actual learning goal of an exercise.
# Course structure
## Preparation
- setup machines
- quick basics matplotlib, numpy, pandas
## Part 1: Introduction
- Why machine learning ?
- What are features ?
- always numerical vectors
- examples: beer, movies, images, text
- unsupervised:
- find structure in set of features
- beers: find groups of beer types
### Coding session:
- read dataframe from csv or excel sheet with beer features
- do some features vs features scatter plots
- use tsne to show clusters
- scikit-learn example to find clusters
## Part 2: supervised learning
- supervised:
- classification: do I like this beer ?
example: draw decision tree
- classifiation: points on both sides of a line, points in circle, xor problem
- idea of decision function: take features, produce real value, use threshold to decide
- simple linear classifier
- show some examples for feature engineering here to apply linear classifier
- regression: how would I rate this movie ?
example: use weighted sum, also example for linear regresor
example: fit a quadratic function
### Coding session:
- show: read circle data, plot data, augment features, learn linear classifier with scikit-learn,
show weights and explain classifier, plot decision boundary
load eval data set and evaluate accuracy.
- adapt: read xor data, plot data, augment features, learn linear classifier with scikit-learn,
show weights and explain classifier, plot decision boundary
load eval data set and evaluate accuracy.
- learn regressor for movie scores.
## Part 3: accuracy, F1, ROC, ...
- how to measure accuracy ?
- regression accuracy
- classifier accuracy:
- confusion matrix
- pitfalls for unbalanced data sets
e.g. diagnose HIV
- precision / recall
- ROC ?
### Coding session
- evaluate accuracy of linear beer classifier
- determine precision / recall
- ROC curve based on threshold
- provide predetermined weights, show ROC curve.
## Part 4: underfitting/overfitting
classifiers / regressors have parameters / degrees of freedom.
- underfitting:
- linear classifier for points on a quadratic function
- overfitting:
- features have actual noise, or not enough information
not enough information: orchid example in 2d. elevate to 3d using another feature.
- polnome of degree 5 to fit points on a line + noise
- points in a circle: draw very exact boundary line
- how to check underfitting / overfitting ?
- measure accuracy
- test data set
- cross validation
### Coding session:
- How to do cross validation with scikit-learn
- use different beer feature set with redundant feature (+)
- run crossvalidation on classifier
- run crossvalidation on movie regression problem
## Part 5: Overview scikit-learn / algorithms
- Linear regressors
- Neighrest neighbours
- SVMs
- demo for RBF: different parameters influence on decision line
- Random forests
- Gradient Tree Boosting
- Clustering
### Coding session
- apply SVM, Random Forests, Gradient boosting to previous examples
- apply clustering to previous examples
- MNIST example
## Part 6: pipelines / cross val / parameter optimiation with scikit-learn
- Scicit learn api
- pipelines, preprocessing (scaler, PCA)
- cross validatioon
- parameter optimization
### Coding session
- build SVM and Random forest crossval pipelines for previous examples
- use PCA in pipeline for (+) to improve performance
- find optimal SVM parameters
- find optimal pca components number
## Part 7: Best practices
- visualize features: pairwise scatter, tSNE
- PCA to undertand data
- check balance of data set, what if not ?
- start with baseline classifier / regressor
- augment data to introduce variance
## Part 8: neural networks
- overview, history
- perceptron
- multi layer
- multi layer demoe with google online tool
- where neural networks work well
- keras demo
### Coding Session
- keras reuse network and play with it.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment