Skip to content
Snippets Groups Projects

Syllabus

Closed chadhat requested to merge syllabus into master
1 unresolved thread
1 file
+ 189
0
Compare changes
  • Side-by-side
  • Inline
+ 189
0
# Three-day Introduction to Machine Learning using Python
**Prepared and conducted by: Scientific IT Services, ETH Zurich**
## Before the workshop
We send around a short script introducing Numpy, Pandas and Matplotlib to prepare the students for the workshop.
## Day 1
### Part 0: Preparation
- Organizational announcements
- Installation/machines preparation
### Part 1: General introduction
- What is machine learning?
- Learning from examples
- Working with hard to understand data
- What are features / samples / feature matrix?
- Always numerical / categorical vectors
- Examples: images, text to numerical examples
- Taxonomy of machine learning:
- Unsupervised:
- Find structure in a set of features
- Supervised:
- Classification example
### Part 2: Supervised learning: concepts of classification
Intention: demonstrate one / two simple examples of classifiers, also introduce the concept of decision boundary
- Idea of simple linear classifier: take features, handcraft a score (weighted sum of features) and use threshold to decide
- Simple linear classifier (e.g. linear SVM)
- Non-linear decision surface examples
- Briefly touch upon feature engineering
- Show code example with logistic regression for a dataset, show weights and plot a decision surface
#### Coding session:
- Change given code to use a linear SVM classifier
- Compare different classifiers and datasets
### Part 3: Overfitting, underfitting and cross-validation
Classifiers (regressors) have parameters / degrees of freedom.
- Underfitting
- Linear classifier for points on a quadratic function
- Overfitting
- How to check for underfitting / overfitting?
- Measure accuracy or other metric on test dataset
- Cross validation
#### Coding session:
- How to do cross validation with scikit-learn
- Use different feature set with redundant features
- Run crossvalidation on a classifier
### Part 4: Metrics for evaluating the performance of a classifier
- How to measure accuracy?
- Classifier accuracy:
- Confusion matrix
- Pitfalls for unbalanced data sets
e.g. diagnose HIV
- Precision / recall
- Mention ROC
- Exercise (pen and paper): determine precision / recall
#### Coding session
- Do cross validation with multiple metrics
- Dataset where simple accuracy measure fails
### Part 5: Preprocessing, pipelines and hyperparameter optimization
- Scikit-Learn API: recall what we have seen up to now.
- Preprocessing (scaler, PCA, function/column transformers)
- Cross validation
- Hyperparameter optimization: grid search / random search
#### Coding session
- Build SVM and Linear Regression cross validation pipelines for previous examples
- Use PCA in pipeline to improve performance
- Find optimal SVM parameters
- Find optimal PCA components number
- **Extended**: Full process for best pipeline/model selection incl. preprocessing steps selection, hyperparameter tunning with cross-validation
## Day 2
### Part 6 a+b: classifiers overview (nearest neighbors & regression-based + tree-based & ensembles)
Intention: quick walk through reliable classifiers, give some background idea if suitable, let them play with some, including modification of parameters.
#### Part 6a
- Nearest neighbours
- Logistic regression
- Linear + kernel SVM classifier (SVC)
- Demo for Radial Basis Function (RBF) kernel trick: Influence of different parameters on the decision surface
#### Part 6b
- Decision trees
- Averaging: Random forests
- Boosting: AdaBoost and mention Gradient Tree Boosting
- Mentions
- Text classification: Naive Bayes for text classification
- Big data:
- Stochastic Gradient Descent classifier,
- Kernel approximation transformation
- Summary/overview
#### Coding session
- Apply SVM, Random Forests, boosting to specific examples
### Part 7: Supervised learning: regression
Intention: demonstrate one / two simple examples of regression
- Regression
- Example: use weighted sum, also example for linear regressor
- Error metrics
- Learn regressors, such as SVR and Kernel Ridge, for salmon weight, full pipeline
- Optional exercise: time-series prediction
### Part 8 a+b: Introduction to neural networks
Intention: Introduce the main conepts behind simple neural networks. Discuss different network architectures and discuss convolution neural networks in more detail. Introduce Keras (Tensorflow 2.0) API.
### Part 8a: Basics of neural networks and introduction to Keras
Intention: Introduction to neural networks and deep learning with Keras (In the next version of the workshop we will use TensorFlow 2.0 (Uses keras API))
- Overview, history
- Perceptron
- Multi layer perceptrons
- Loss function, gradient-based learning, Activation functions
- Multi layer demo with google online tool
- Introduction to Keras
- Simple examples to learn Keras API
- Using Scikit-learn function on keras models
- Handwritten digits classification (MNIST)
- Regularization, Dropout
#### Coding Session
- Modify parameters in code and observe what is happening
- Write similar code to solve problems from previous sections
## Day 3
### Part 8b: Network architectures and convolution neural networks
Intention: Briefly discuss different network architectures and their applications. Explain convolutional neural networks. Build CNNs using Keras.
- Mention some network architectures
- Convolution neural networks in detail
- Convolutions
- Maxpooling
- Fashion MNIST example
#### Coding Session
- Play with fashion MNIST example
- Build and train a simple CNN to classify the CIFAR10 dataset
### Part 9a+b: Real-life examples
Intention: To introduce some realistic use-cases and apply the methods we have learned
### Part 9a: Histopathologic cancer detection using images
- Walk through on how to approach and solve this problem
### Part 9b: Prediction of arm movements using EEG data
- Students work on their own and are assisted by the tutors
Loading