Skip to content
Snippets Groups Projects
Commit c1f14503 authored by schmittu's avatar schmittu :beer:
Browse files

added few things to 05_classifiers_overview.ipynb draft

parent 377739b7
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Chapter 5: An overview of classifiers
%% Cell type:markdown id: tags:
What classifiers ?
- Neighrest neighbours
- Logistic Regression
- Linear SVM
- Kernel SVM
- Decision trees
- Random forests
- XGboost (https://xgboost.readthedocs.io/en/latest/) (not part of scikit-learn, won many kaggle competitions https://www.kaggle.com/dansbecker/xgboost, offers scikit-learn API https://www.kaggle.com/stuarthallows/using-xgboost-with-scikit-learn)
For every classifier: some examples for decision surfaces.
Historical information ?
%% Cell type:markdown id: tags:
## Neighrest neighbours
- For a new feature $x$ look for $N$ closests examples from learning data (usually using the euclidean distance).
- Classify $x$ as the majority of labels among these closest examples.
Parameter: $N$. the larger $N$ the smoother the decision surface.
Benefit: simple
Disadvanages: needs lots of data, does not work well for dimesions > 8(ish) (source !?)
TODO: Commentary about course of dimensionality
%% Cell type:markdown id: tags:
## Logistic regression
$\sigma (t)={\frac {e^{t}}{e^{t}+1}}={\frac {1}{1+e^{-t}}}$
plot !
linear classifier, sigma shrinks result of linear combinations to interval 0, 1 which are interpreted as class probabilities.
works better in high dimensions
weights can be interpreted
Parameters: C (https://stackoverflow.com/questions/22851316/what-is-the-inverse-of-regularization-strength-in-logistic-regression-how-shoul)
Penelaty to avoid overfitting
Plot logistig regression diagram as very simple neural network ?
%% Cell type:markdown id: tags:
## Linear SVM
- linear classifier such that margin is maximised (show example)
- based on "empirical risk minization" (vapnik)
the final weight vector is a linear combination of a subset of the features from the learning set. These are called "support vectors".
weights can be interpreted
C: how much weight to we put on examples within the "margin strip"
%% Cell type:markdown id: tags:
## Kernel based SVM
So called kernels are used to build the classifiation surface. Default kernel is rbf.
Hard to intepret the internals.
for rbf: gamma parameter is "decline rate" of rbf functions, controls smoothness of decision surface.
feature scaling is crucial for good performance !
%% Cell type:markdown id: tags:
## Decision trees
- simple example incl. plot
- basic idea: "optimal" splits...
- benefit: interpretability
Parameter: depth, the deeper the higher the risk for overfitting.
%% Cell type:markdown id: tags:
## Random forests
- generate many week classifiers by creating shallow trees with random splittings
- use so call bagging to implement a good overall classifier
- benefits: allows also estimates about feature importance
- more robust to overfitting than decision trees
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment