chadhat · chadhat · b4e7b2ce · b4e7b2ce · b4e7b2ce · b4e7b2ce
--- a/detailed_syllabus.md 0 → 100644

+ 189

− 0
+++ b/detailed_syllabus.md 0 → 100644

+ 189

− 0
+# Three-day Introduction to Machine Learning using Python
+**Prepared and conducted by: Scientific IT Services, ETH Zurich**
+
+## Before the workshop
+
+We send around a short script introducing Numpy, Pandas and Matplotlib to prepare the students for the workshop.
+
+## Day 1
+
+### Part 0: Preparation
+
+- Organizational announcements
+- Installation/machines preparation
+
+### Part 1: General introduction
+
+- What is machine learning?
+  - Learning from examples
+  - Working with hard to understand data
+
+- What are features / samples / feature matrix?
+  - Always numerical / categorical vectors
+  - Examples: images, text to numerical examples
+
+- Taxonomy of machine learning:
+    - Unsupervised:
+      - Find structure in a set of features
+    - Supervised:
+      - Classification example
+
+### Part 2: Supervised learning: concepts of classification
+
+Intention: demonstrate one / two simple examples of classifiers, also introduce the concept of decision boundary
+
+- Idea of simple linear classifier: take features, handcraft a score (weighted sum of features) and use threshold to decide
+  - Simple linear classifier (e.g. linear SVM)
+- Non-linear decision surface examples
+- Briefly touch upon feature engineering
+- Show code example with logistic regression for a dataset, show weights and plot a decision surface
+
+#### Coding session:
+
+- Change given code to use a linear SVM classifier
+- Compare different classifiers and datasets
+
+### Part 3: Overfitting, underfitting and cross-validation
+
+Classifiers (regressors) have parameters / degrees of freedom.
+
+- Underfitting
+  - Linear classifier for points on a quadratic function
+
+- Overfitting
+
+- How to check for underfitting / overfitting?
+  - Measure accuracy or other metric on test dataset
+  - Cross validation
+
+
+#### Coding session:
+- How to do cross validation with scikit-learn
+- Use different feature set with redundant features
+- Run crossvalidation on a classifier
+
+
+### Part 4: Metrics for evaluating the performance of a classifier
+
+- How to measure accuracy?
+  - Classifier accuracy:
+    - Confusion matrix
+    - Pitfalls for unbalanced data sets
+        e.g. diagnose HIV
+    - Precision / recall
+    - Mention ROC
+- Exercise (pen and paper): determine precision / recall
+
+#### Coding session
+
+- Do cross validation with multiple metrics
+- Dataset where simple accuracy measure fails
+
+### Part 5: Preprocessing, pipelines and hyperparameter optimization
+
+- Scikit-Learn API: recall what we have seen up to now.
+- Preprocessing (scaler, PCA, function/column transformers)
+- Cross validation
+- Hyperparameter optimization: grid search / random search
+
+#### Coding session
+
+- Build SVM and Linear Regression cross validation pipelines for previous examples
+- Use PCA in pipeline to improve performance
+- Find optimal SVM parameters
+- Find optimal PCA components number
+
+- **Extended**: Full process for best pipeline/model selection incl. preprocessing steps selection, hyperparameter tunning with cross-validation
+
+## Day 2
+
+### Part 6 a+b: classifiers overview (nearest neighbors & regression-based + tree-based & ensembles)
+
+Intention: quick walk through reliable classifiers, give some background idea if suitable, let them play with some, including modification of parameters.
+
+#### Part 6a
+
+- Nearest neighbours
+- Logistic regression
+- Linear + kernel SVM classifier (SVC)
+  - Demo for Radial Basis Function (RBF) kernel trick: Influence of different parameters on the decision surface
+
+#### Part 6b
+
+- Decision trees
+- Averaging: Random forests
+- Boosting: AdaBoost and mention Gradient Tree Boosting
+- Mentions
+  - Text classification: Naive Bayes for text classification
+  - Big data:
+    - Stochastic Gradient Descent classifier,
+    - Kernel approximation transformation
+- Summary/overview
+
+#### Coding session
+
+- Apply SVM, Random Forests, boosting to specific examples
+
+### Part 7: Supervised learning: regression
+
+Intention: demonstrate one / two simple examples of regression
+
+- Regression
+- Example: use weighted sum, also example for linear regressor
+- Error metrics
+- Learn regressors, such as SVR and Kernel Ridge, for salmon weight, full pipeline
+- Optional exercise: time-series prediction
+
+### Part 8 a+b: Introduction to neural networks
+
+Intention: Introduce the main conepts behind simple neural networks. Discuss different network architectures and discuss convolution neural networks in more detail. Introduce Keras (Tensorflow 2.0) API. 
+
+### Part 8a: Basics of neural networks and introduction to Keras
+
+Intention: Introduction to neural networks and deep learning with Keras (In the next version of the workshop we will use TensorFlow 2.0 (Uses keras API))
+
+- Overview, history
+- Perceptron
+- Multi layer perceptrons
+- Loss function, gradient-based learning, Activation functions
+- Multi layer demo with google online tool
+- Introduction to Keras
+  - Simple examples to learn Keras API
+  - Using Scikit-learn function on keras models
+- Handwritten digits classification (MNIST)
+- Regularization, Dropout
+
+#### Coding Session
+
+- Modify parameters in code and observe what is happening
+- Write similar code to solve problems from previous sections
+
+## Day 3
+
+### Part 8b: Network architectures and convolution neural networks
+
+Intention: Briefly discuss different network architectures and their applications. Explain convolutional neural networks. Build CNNs using Keras. 
+
+- Mention some network architectures
+- Convolution neural networks in detail
+  - Convolutions
+  - Maxpooling
+- Fashion MNIST example
+
+#### Coding Session
+
+- Play with fashion MNIST example
+- Build and train a simple CNN to classify the CIFAR10 dataset
+
+
+### Part 9a+b: Real-life examples
+
+Intention: To introduce some realistic use-cases and apply the methods we have learned
+
+### Part 9a: Histopathologic cancer detection using images
+
+- Walk through on how to approach and solve this problem
+
+### Part 9b: Prediction of arm movements using EEG data
+
+- Students work on their own and are assisted by the tutors