Skip to content
Snippets Groups Projects
machine_learning_workshop_proposal.ipynb 5.67 KiB
Newer Older
  • Learn to ignore specific revisions
  • schmittu's avatar
    schmittu committed
    {
     "cells": [
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Introduction to machine-learning with Python\n",
        "\n",
        "\n",
        "\n",
        "### Targeted audience\n",
        "\n",
        "- Researchers having no machine learning experience yet.\n",
        "- Basic Python knowledge.\n",
        "- Almost no math knowledge required.\n",
        "\n",
        "### Course structure\n",
        "\n",
        "- Two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.\n",
        "- Every part below includes a coding session using Jupyter notebooks.\n",
        "- Coding sessions provide code frames which should be completed.\n",
        "- We provide solutions.\n",
        "\n",
        "\n",
        "## Day 1\n",
        "\n",
        "### Part 0: Preparation\n",
        "\n",
        "- Quick basics matplotlib, numpy, pandas?\n",
        "\n",
        "\n",
        "#### Coding session\n",
        "\n",
        "- read dataframe from csv or excel sheet with beer features\n",
        "- do some features vs features scatter plots\n",
        "\n",
        "\n",
        "### Part 1: Introduction\n",
        "\n",
        "- What is machine learning ?\n",
        "- What are features / samples / feature matrix ?\n",
        "- Learning problems: supervised / unsupervised\n",
        "\n",
        "\n",
        "#### Code walkthrough:\n",
        "\n",
        "  - Classification: linear SVM classifier or logistic regression example\n",
        "  - Clustering: scikit-learn example to find clusters.\n",
        "\n",
        "\n",
        "### Part 2: classification\n",
        "\n",
        "  Intention: demonstrate one / two simple examples of classifiers, also\n",
        "             introduce the concept of decision boundary\n",
        "\n",
        "  - Introduction: some simple two dimensional examples incl. decision function.\n",
        "\n",
        "  - Idea of linear classifier:\n",
        "    - simple linear classifier (linear SVM e.g.)\n",
        "    - beer example with some weights\n",
        "\n",
        "  - Discuss code example with logistic regression for beer data, show weights\n",
        "\n",
        "#### Coding session:\n",
        "\n",
        "  - Change given code to use a linear SVM classifier\n",
        "  - Use different data set which can not be classified well with a linear classifier\n",
        "\n",
        "\n",
        "### Part 3: accuracy, F1, ROC, ...\n",
        "\n",
        "Intention: accuracy is useful but has pitfalls\n",
        "\n",
        "- how to measure accuracy ?\n",
        "\n",
        "    - confusion matrix\n",
        "    - accurarcy\n",
        "    - pitfalls for unbalanced data sets\n",
        "        e.g. diagnose HIV\n",
        "    - precision / recall\n",
        "\n",
        "#### Coding session\n",
        "\n",
        "- Evaluate accuracy of linear beer classifier from latest section\n",
        "- Determine precision / recall\n",
        "\n",
        "\n",
        "### Part 4: underfitting/overfitting\n",
        "\n",
        "classifiers / regressors have parameters / degrees of freedom.\n",
        "\n",
        "- underfitting: linear classifier on nonlinear problem\n",
        "\n",
        "- overfitting:\n",
        "\n",
        "  - features have actual noise, or not enough information: orchid example in 2d. elevate to 3d using another feature.\n",
        "  - polynome of degree 5 to fit points on a line + noise\n",
        "  - points in a circle: draw very exact boundary line\n",
        "\n",
        "- how to check underfitting / overfitting ?\n",
        "\n",
        "  - measure accuracy or other metric on test dataset\n",
        "  - cross validation\n",
        "\n",
        "\n",
        "#### Coding session:\n",
        "\n",
        "- How to do cross validation with scikit-learn\n",
        "- run cross validation on classifier for beer data\n",
        "\n",
        "\n",
        "### Part 5: pipelines / parameter tuning with scikit-learn\n",
        "\n",
        "- Scikit learn API incl. summary of what we have seen up to now.\n",
        "- pipelines, preprocessing (scaler, PCA)\n",
        "- cross validation\n",
        "- Hyper parameter tuning: grid search / random search.\n",
        "\n",
        "#### Coding session\n",
        "\n",
        "- examples\n",
        "\n",
        "\n",
        "## DAY 2\n",
        "\n",
        "### Part 6: Overview classifiers\n",
        "\n",
        "- Nearest neighbours\n",
        "- SVMs\n",
        "  - demo for RBF: different parameters influence on decision line\n",
        "- Random forests\n",
        "- Gradient Tree Boosting\n",
        "\n",
        "\n",
        "#### Coding session\n",
        "\n",
        "- Prepare examples for 2d classification problems incl. visualization of different\n",
        "  decision surfaces.\n",
        "\n",
        "- Play with different classifiers on beer data\n",
        "\n",
        "### Part 7: Regression\n",
        "\n",
        "- What are differences compared to classification: output, how to measure accuracy, ...\n",
        "\n",
        "- Example: fit polynomial, examples for underfitting and overfitting\n",
        "\n",
        "\n",
        "#### Coding session\n",
        "\n",
        "Introduce movie data set, learn SVR or other regressor on this data set.\n",
        "\n",
        "\n",
        "### Part 8: Introduction neural networks\n",
        "\n",
        "\n",
        "- Overview of the field\n",
        "- Introduction to feed forward neural networks\n",
        "- Demo Keras\n",
        "\n",
        "#### Coding Session\n",
        "\n",
        "- keras reuse network and play with it.\n",
        "\n",
        "\n",
        "## Workshop\n",
        "\n",
        "- assist to setup the workshop material on own computer.\n",
        "- provide example problems if attendees don't bring own data.\n",
        "\n"
       ]
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": "Python 3",
       "language": "python",
       "name": "python3"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
       "version": "3.6.6"
      }
     },
     "nbformat": 4,
     "nbformat_minor": 2
    }