"- XGboost (https://xgboost.readthedocs.io/en/latest/) (not part of scikit-learn, won many kaggle competitions https://www.kaggle.com/dansbecker/xgboost, offers scikit-learn API https://www.kaggle.com/stuarthallows/using-xgboost-with-scikit-learn)\n",
"\n",
"\n",
"For every classifier: some examples for decision surfaces.\n",
"\n",
"Historical information ?"
...
...
@@ -93,7 +96,9 @@
"\n",
"Hard to intepret the internals.\n",
"\n",
"for rbf: gamma parameter is \"decline rate\" of rbf functions, controls smoothness of decision surface.\n"
"for rbf: gamma parameter is \"decline rate\" of rbf functions, controls smoothness of decision surface.\n",
"\n",
"feature scaling is crucial for good performance !"
]
},
{
...
...
%% Cell type:markdown id: tags:
# Chapter 5: An overview of classifiers
%% Cell type:markdown id: tags:
What classifiers ?
- Neighrest neighbours
- Logistic Regression
- Linear SVM
- Kernel SVM
- Decision trees
- Random forests
- XGboost (https://xgboost.readthedocs.io/en/latest/) (not part of scikit-learn, won many kaggle competitions https://www.kaggle.com/dansbecker/xgboost, offers scikit-learn API https://www.kaggle.com/stuarthallows/using-xgboost-with-scikit-learn)
For every classifier: some examples for decision surfaces.
Historical information ?
%% Cell type:markdown id: tags:
## Neighrest neighbours
- For a new feature $x$ look for $N$ closests examples from learning data (usually using the euclidean distance).
- Classify $x$ as the majority of labels among these closest examples.
Parameter: $N$. the larger $N$ the smoother the decision surface.
Benefit: simple
Disadvanages: needs lots of data, does not work well for dimesions > 8(ish) (source !?)