Skip to content
Snippets Groups Projects
content.md 3.66 KiB
Newer Older
  • Learn to ignore specific revisions
  • schmittu's avatar
    schmittu committed
    # Targeted audience
    
    - Researchers having no machine learning experience yet.
    - Basic Python knowledge.
    - Almost no math knowledge required.
    
    # Course structure
    
    - Two days workshop, 1.5 days workshop + .5 day working on own data / prepared data.
    
    chadhat's avatar
    chadhat committed
    - Every part below includes a coding session using Jupyter notebooks.
    
    schmittu's avatar
    schmittu committed
    - Coding sessions provide code frames which should be completed.
    - We provide solutions.
    
    
    # Day 1
    
    ## Part 0: Preparation
    
    
    chadhat's avatar
    chadhat committed
    - Quick basics matplotlib, numpy, pandas?
    
    schmittu's avatar
    schmittu committed
    
    
    ### Coding session
    
    - read dataframe from csv or excel sheet with beer features
    - do some features vs features scatter plots
    
    
    ## Part 1: Introduction
    
    - What is machine learning ?
    - What are features / samples / feature matrix ?
    - Learning problems: supervised / unsupervised
    
    
    ### Code walkthrough:
    
      - Classification: linear SVM classifier or logistic regression example
      - Clustering: scikit-learn example to find clusters.
    
    
    ## Part 2: classification
    
      Intention: demonstrate one / two simple examples of classifiers, also
                 introduce the concept of decision boundary
    
      - Introduction: some simple two dimensional examples incl. decision function.
    
      - Idea of linear classifier:
        - simple linear classifier (linear SVM e.g.)
        - beer example with some weights
    
      - Discuss code example with logistic regression for beer data, show weights
    
    ### Coding session:
    
      - Change given code to use a linear SVM classifier
      - Use different data set which can not be classified well with a linear classifier
    
    
    ## Part 3: accuracy, F1, ROC, ...
    
    
    chadhat's avatar
    chadhat committed
    Intention: accuracy is useful but has pitfalls
    
    schmittu's avatar
    schmittu committed
    
    - how to measure accuracy ?
    
        - confusion matrix
        - accurarcy
        - pitfalls for unbalanced data sets
            e.g. diagnose HIV
        - precision / recall
    
    ### Coding session
    
    - Evaluate accuracy of linear beer classifier from latest section
    - Determine precision / recall
    
    
    ## Part 4: underfitting/overfitting
    
    classifiers / regressors have parameters / degrees of freedom.
    
    - underfitting: linear classifier on nonlinear problem
    
    - overfitting:
    
    
    chadhat's avatar
    chadhat committed
      - features have actual noise, or not enough information: orchid example in 2d. elevate to 3d using another feature.
      - polynome of degree 5 to fit points on a line + noise
    
    schmittu's avatar
    schmittu committed
      - points in a circle: draw very exact boundary line
    
    - how to check underfitting / overfitting ?
    
      - measure accuracy or other metric on test dataset
      - cross validation
    
    
    ### Coding session:
    
    - How to do cross validation with scikit-learn
    
    chadhat's avatar
    chadhat committed
    - run cross validation on classifier for beer data
    
    schmittu's avatar
    schmittu committed
    
    
    ## Part 5: pipelines / parameter tuning with scikit-learn
    
    
    chadhat's avatar
    chadhat committed
    - Scikit learn API incl. summary of what we have seen up to now.
    
    schmittu's avatar
    schmittu committed
    - pipelines, preprocessing (scaler, PCA)
    
    chadhat's avatar
    chadhat committed
    - cross validation
    
    schmittu's avatar
    schmittu committed
    - Hyper parameter tuning: grid search / random search.
    
    ### Coding session
    
    - examples
    
    
    # DAY 2
    
    ## Part 6: Overview classifiers
    
    
    chadhat's avatar
    chadhat committed
    - Nearest neighbours
    
    schmittu's avatar
    schmittu committed
    - SVMs
      - demo for RBF: different parameters influence on decision line
    - Random forests
    - Gradient Tree Boosting
    
    
    ### Coding session
    
    
    chadhat's avatar
    chadhat committed
    - Prepare examples for 2d classification problems incl. visualization of different
    
    schmittu's avatar
    schmittu committed
      decision surfaces.
    
    - Play with different classifiers on beer data
    
    ## Part 7: Regression
    
    - What are differences compared to classification: output, how to measure accuracy, ...
    
    - Example: fit polynomial, examples for underfitting and overfitting
    
    
    ### Coding session
    
    Introduce movie data set, learn SVR or other regressor on this data set.
    
    
    
    schmittu's avatar
    schmittu committed
    ## Part 8: Introduction neural networks
    
    schmittu's avatar
    schmittu committed
    
    
    - Overview of the field
    
    chadhat's avatar
    chadhat committed
    - Introduction to feed forward neural networks
    
    schmittu's avatar
    schmittu committed
    - Demo Keras
    
    ### Coding Session
    
    - keras reuse network and play with it.
    
    
    
    schmittu's avatar
    schmittu committed
    ## Workshop
    
    
    chadhat's avatar
    chadhat committed
    - assist to setup the workshop material on own computer.
    
    schmittu's avatar
    schmittu committed
    - provide example problems if attendees don't bring own data.