05_classifiers_overview.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 5: An overview of classifiers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What classifiers ?\n",
    "\n",
    "- Neighrest neighbours\n",
    "- Logistic Regression\n",
    "- Linear SVM\n",
    "\n",
    "- Kernel SVM\n",
    "- Decision trees\n",
    "- Random forests\n",
    "\n",
    "- XGboost (https://xgboost.readthedocs.io/en/latest/) (not part of scikit-learn, won many kaggle competitions https://www.kaggle.com/dansbecker/xgboost, offers scikit-learn API https://www.kaggle.com/stuarthallows/using-xgboost-with-scikit-learn)\n",
    "\n",
    "\n",
    "For every classifier: some examples for decision surfaces.\n",
    "\n",
    "Historical information ?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Neighrest neighbours\n",
    "\n",
    "- For a new feature $x$ look for $N$ closests examples from learning data (usually using the euclidean distance).\n",
    "- Classify $x$ as the majority of labels among these closest examples.\n",
    "\n",
    "Parameter: $N$. the larger $N$ the smoother the decision surface.\n",
    "\n",
    "Benefit: simple\n",
    "\n",
    "Disadvanages: needs lots of data, does not work well for dimesions > 8(ish) (source !?)\n",
    "\n",
    "TODO: Commentary about course of dimensionality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logistic regression\n",
    "\n",
    "$\\sigma (t)={\\frac {e^{t}}{e^{t}+1}}={\\frac {1}{1+e^{-t}}}$\n",
    "\n",
    "plot !\n",
    "\n",
    "linear classifier, sigma shrinks result of linear combinations to interval 0, 1 which are interpreted as class probabilities.\n",
    "\n",
    "works better in high dimensions\n",
    "\n",
    "weights can be interpreted\n",
    "\n",
    "Parameters: C (https://stackoverflow.com/questions/22851316/what-is-the-inverse-of-regularization-strength-in-logistic-regression-how-shoul)\n",
    "\n",
    "Penelaty to avoid overfitting\n",
    "\n",
    "Plot logistig regression diagram as very simple neural network ?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear SVM\n",
    "\n",
    "- linear classifier such that margin is maximised (show example)\n",
    "- based on \"empirical risk minization\" (vapnik)\n",
    "\n",
    "the final weight vector is a linear combination of a subset of the features from the learning set. These are called \"support vectors\".\n",
    "\n",
    "weights can be interpreted\n",
    "\n",
    "C: how much weight to we put on examples within the \"margin strip\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Kernel based SVM\n",
    "\n",
    "So called kernels are used to build the classifiation surface. Default kernel is rbf.\n",
    "\n",
    "Hard to intepret the internals.\n",
    "\n",
    "for rbf: gamma parameter is \"decline rate\" of rbf functions, controls smoothness of decision surface.\n",
    "\n",
    "feature scaling is crucial for good performance !"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Decision trees\n",
    "\n",
    "- simple example incl. plot\n",
    "- basic idea: \"optimal\" splits...\n",
    "\n",
    "- benefit: interpretability\n",
    "\n",
    "Parameter: depth, the deeper the higher the risk for overfitting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random forests\n",
    "\n",
    "- generate many week classifiers by creating shallow trees with random splittings\n",
    "- use so call bagging to implement a good overall classifier\n",
    "\n",
    "- benefits: allows also estimates about feature importance\n",
    "\n",
    "- more robust to overfitting than decision trees\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}