{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to machine-learning with Python\n", "\n", "\n", "\n", "### Targeted audience\n", "\n", "- Researchers having no machine learning experience yet.\n", "- Basic Python knowledge.\n", "- Almost no math knowledge required.\n", "\n", "### Course structure\n", "\n", "- Two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.\n", "- Every part below includes a coding session using Jupyter notebooks.\n", "- Coding sessions provide code frames which should be completed.\n", "- We provide solutions.\n", "\n", "\n", "## Day 1\n", "\n", "### Part 0: Preparation\n", "\n", "- Quick basics matplotlib, numpy, pandas?\n", "\n", "\n", "#### Coding session\n", "\n", "- read dataframe from csv or excel sheet with beer features\n", "- do some features vs features scatter plots\n", "\n", "\n", "### Part 1: Introduction\n", "\n", "- What is machine learning ?\n", "- What are features / samples / feature matrix ?\n", "- Learning problems: supervised / unsupervised\n", "\n", "\n", "#### Code walkthrough:\n", "\n", " - Classification: linear SVM classifier or logistic regression example\n", " - Clustering: scikit-learn example to find clusters.\n", "\n", "\n", "### Part 2: classification\n", "\n", " Intention: demonstrate one / two simple examples of classifiers, also\n", " introduce the concept of decision boundary\n", "\n", " - Introduction: some simple two dimensional examples incl. decision function.\n", "\n", " - Idea of linear classifier:\n", " - simple linear classifier (linear SVM e.g.)\n", " - beer example with some weights\n", "\n", " - Discuss code example with logistic regression for beer data, show weights\n", "\n", "#### Coding session:\n", "\n", " - Change given code to use a linear SVM classifier\n", " - Use different data set which can not be classified well with a linear classifier\n", "\n", "\n", "### Part 3: accuracy, F1, ROC, ...\n", "\n", "Intention: accuracy is useful but has pitfalls\n", "\n", "- how to measure accuracy ?\n", "\n", " - confusion matrix\n", " - accurarcy\n", " - pitfalls for unbalanced data sets\n", " e.g. diagnose HIV\n", " - precision / recall\n", "\n", "#### Coding session\n", "\n", "- Evaluate accuracy of linear beer classifier from latest section\n", "- Determine precision / recall\n", "\n", "\n", "### Part 4: underfitting/overfitting\n", "\n", "classifiers / regressors have parameters / degrees of freedom.\n", "\n", "- underfitting: linear classifier on nonlinear problem\n", "\n", "- overfitting:\n", "\n", " - features have actual noise, or not enough information: orchid example in 2d. elevate to 3d using another feature.\n", " - polynome of degree 5 to fit points on a line + noise\n", " - points in a circle: draw very exact boundary line\n", "\n", "- how to check underfitting / overfitting ?\n", "\n", " - measure accuracy or other metric on test dataset\n", " - cross validation\n", "\n", "\n", "#### Coding session:\n", "\n", "- How to do cross validation with scikit-learn\n", "- run cross validation on classifier for beer data\n", "\n", "\n", "### Part 5: pipelines / parameter tuning with scikit-learn\n", "\n", "- Scikit learn API incl. summary of what we have seen up to now.\n", "- pipelines, preprocessing (scaler, PCA)\n", "- cross validation\n", "- Hyper parameter tuning: grid search / random search.\n", "\n", "#### Coding session\n", "\n", "- examples\n", "\n", "\n", "## DAY 2\n", "\n", "### Part 6: Overview classifiers\n", "\n", "- Nearest neighbours\n", "- SVMs\n", " - demo for RBF: different parameters influence on decision line\n", "- Random forests\n", "- Gradient Tree Boosting\n", "\n", "\n", "#### Coding session\n", "\n", "- Prepare examples for 2d classification problems incl. visualization of different\n", " decision surfaces.\n", "\n", "- Play with different classifiers on beer data\n", "\n", "### Part 7: Regression\n", "\n", "- What are differences compared to classification: output, how to measure accuracy, ...\n", "\n", "- Example: fit polynomial, examples for underfitting and overfitting\n", "\n", "\n", "#### Coding session\n", "\n", "Introduce movie data set, learn SVR or other regressor on this data set.\n", "\n", "\n", "### Part 8: Introduction neural networks\n", "\n", "\n", "- Overview of the field\n", "- Introduction to feed forward neural networks\n", "- Demo Keras\n", "\n", "#### Coding Session\n", "\n", "- keras reuse network and play with it.\n", "\n", "\n", "## Workshop\n", "\n", "- assist to setup the workshop material on own computer.\n", "- provide example problems if attendees don't bring own data.\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }