Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
M
machinelearning-introduction-workshop
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
sispub
courses
machinelearning-introduction-workshop
Commits
3c164b9f
Commit
3c164b9f
authored
6 years ago
by
schmittu
Browse files
Options
Downloads
Patches
Plain Diff
first version of course layout
parent
65cf35eb
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
layout.md
+175
-0
175 additions, 0 deletions
layout.md
with
175 additions
and
0 deletions
layout.md
0 → 100644
+
175
−
0
View file @
3c164b9f
# Targeted audience
-
Researchers from DBIOL, BSSE and DGESS having no machine learning experience yet.
-
Basic Python knowledge.
-
Almost no math knowledge.
# Concepts
-
smooth learning curve
-
explain fundamental concepts first, discuss exceptions, corner cases, pitfalls late.
-
plotting / pandas / numpy first. Else we participants might be disctracted during
coding exercises and miss the actual learning goal of an exercise.
# Course structure
## Preparation
-
setup machines
-
quick basics matplotlib, numpy, pandas
## Part 1: Introduction
-
Why machine learning ?
-
What are features ?
-
always numerical vectors
-
examples: beer, movies, images, text
-
unsupervised:
-
find structure in set of features
-
beers: find groups of beer types
### Coding session:
-
read dataframe from csv or excel sheet with beer features
-
do some features vs features scatter plots
-
use tsne to show clusters
-
scikit-learn example to find clusters
## Part 2: supervised learning
-
supervised:
-
classification: do I like this beer ?
example: draw decision tree
-
classifiation: points on both sides of a line, points in circle, xor problem
-
idea of decision function: take features, produce real value, use threshold to decide
-
simple linear classifier
-
show some examples for feature engineering here to apply linear classifier
-
regression: how would I rate this movie ?
example: use weighted sum, also example for linear regresor
example: fit a quadratic function
### Coding session:
-
show: read circle data, plot data, augment features, learn linear classifier with scikit-learn,
show weights and explain classifier, plot decision boundary
load eval data set and evaluate accuracy.
-
adapt: read xor data, plot data, augment features, learn linear classifier with scikit-learn,
show weights and explain classifier, plot decision boundary
load eval data set and evaluate accuracy.
-
learn regressor for movie scores.
## Part 3: accuracy, F1, ROC, ...
-
how to measure accuracy ?
-
regression accuracy
-
classifier accuracy:
-
confusion matrix
-
pitfalls for unbalanced data sets
e.g. diagnose HIV
-
precision / recall
-
ROC ?
### Coding session
-
evaluate accuracy of linear beer classifier
-
determine precision / recall
-
ROC curve based on threshold
-
provide predetermined weights, show ROC curve.
## Part 4: underfitting/overfitting
classifiers / regressors have parameters / degrees of freedom.
-
underfitting:
-
linear classifier for points on a quadratic function
-
overfitting:
-
features have actual noise, or not enough information
not enough information: orchid example in 2d. elevate to 3d using another feature.
-
polnome of degree 5 to fit points on a line + noise
-
points in a circle: draw very exact boundary line
-
how to check underfitting / overfitting ?
-
measure accuracy
-
test data set
-
cross validation
### Coding session:
-
How to do cross validation with scikit-learn
-
use different beer feature set with redundant feature (+)
-
run crossvalidation on classifier
-
run crossvalidation on movie regression problem
## Part 5: Overview scikit-learn / algorithms
-
Linear regressors
-
Neighrest neighbours
-
SVMs
-
demo for RBF: different parameters influence on decision line
-
Random forests
-
Gradient Tree Boosting
-
Clustering
### Coding session
-
apply SVM, Random Forests, Gradient boosting to previous examples
-
apply clustering to previous examples
-
MNIST example
## Part 6: pipelines / cross val / parameter optimiation with scikit-learn
-
Scicit learn api
-
pipelines, preprocessing (scaler, PCA)
-
cross validatioon
-
parameter optimization
### Coding session
-
build SVM and Random forest crossval pipelines for previous examples
-
use PCA in pipeline for (+) to improve performance
-
find optimal SVM parameters
-
find optimal pca components number
## Part 7: Best practices
-
visualize features: pairwise scatter, tSNE
-
PCA to undertand data
-
check balance of data set, what if not ?
-
start with baseline classifier / regressor
-
augment data to introduce variance
## Part 8: neural networks
-
overview, history
-
perceptron
-
multi layer
-
multi layer demoe with google online tool
-
where neural networks work well
-
keras demo
### Coding Session
-
keras reuse network and play with it.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment