Skip to content
Snippets Groups Projects
02_classification.ipynb 582 KiB
Newer Older
  • Learn to ignore specific revisions
  • schmittu's avatar
    schmittu committed
       "execution_count": 27,
    
    schmittu's avatar
    schmittu committed
       "metadata": {},
       "outputs": [
        {
         "data": {
    
    schmittu's avatar
    schmittu committed
          "image/png": "\n",
    
    schmittu's avatar
    schmittu committed
          "text/plain": [
           "<Figure size 360x360 with 1 Axes>"
          ]
         },
         "metadata": {
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "xv = xor[\"x\"]\n",
        "yv = xor[\"y\"]\n",
        "\n",
    
    schmittu's avatar
    schmittu committed
        "colors = [\"rb\"[i] for i in xor[\"label\"]]\n",
    
    schmittu's avatar
    schmittu committed
        "plt.figure(figsize=(5, 5))\n",
        "plt.xlim([-2, 2])\n",
        "plt.ylim([-2, 2])\n",
        "plt.title(\"green points have label True\")\n",
        "plt.scatter(xv, yv, color=colors, marker=\".\");"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Again, this example data set can not be separated by a line. But we see that points where the sign of x and y are the same appear to form one class, and point with different signs for x and y belong to the other class.\n",
        "\n",
    
        "How can we engineer a more descriptive feature which describes \"x and y have the same sign\" ? Here we can use the fact that the product of two numbers is postive if and only if both numbers have the same sign.\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "So lets plot a histogram over `x * y`:"
       ]
      },
      {
       "cell_type": "code",
    
    schmittu's avatar
    schmittu committed
       "execution_count": 28,
    
    schmittu's avatar
    schmittu committed
       "metadata": {},
       "outputs": [
        {
         "data": {
          "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD8CAYAAABn919SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADvBJREFUeJzt3X+s3XV9x/Hna6UOoy5gOCMdpavZCIYYLcldh2F/dCimc0ZwccvIRjCy1CWSYGK2+SOZlzgTF6fsjy0udTCajOmIPwJhOOywhJg4tGitheJ0TiOk0hpHgCxhKbz3x/2SXS/39px7zvfcc/rp85Gc3HM+53vOeVHaVz/9fj/f70lVIUk6/f3crANIkvphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIacdZGfth5551X27dv38iPlKTT3kMPPfSTqhoM225DC3379u0cPHhwIz9Skk57SX44ynbucpGkRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZs6Jmi0jxZvH9x9fFdq49L884ZuiQ1YmihJzk7ydeSfCvJw0lu6sZvS/JfSQ51tx3TjytJWssou1yeBa6oqmeSbAa+kuSL3XN/UlWfnV48SdKohhZ6VRXwTPdwc3eraYaSJK3fSPvQk2xKcgg4Duyvqge7pz6S5HCSm5P8/Bqv3ZPkYJKDJ06c6Cm2JGmlkQq9qp6rqh3AVmBnktcA7wdeDfwa8Ergz9Z47d6qWqiqhcFg6PXZJUljWtcql6p6EjgA7K6qY7XkWeAfgJ3TCChJGs0oq1wGSc7p7r8UuBJ4NMmWbizA1cCRaQaVJJ3aKKtctgD7kmxi6S+AO6rq7iRfTjIAAhwC/niKOSVJQ4yyyuUwcOkq41dMJZEkaSyeKSpJjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEYMLfQkZyf5WpJvJXk4yU3d+KuSPJjke0n+OclLph9XkrSWUWbozwJXVNXrgB3A7iSXAX8J3FxVvwr8N3D99GJKkoYZWui15Jnu4ebuVsAVwGe78X3A1VNJKEkayUj70JNsSnIIOA7sB/4TeLKqTnabPAZcsMZr9yQ5mOTgiRMn+sgsSVrFSIVeVc9V1Q5gK7ATePWoH1BVe6tqoaoWBoPBmDElScOsa5VLVT0JHABeD5yT5Kzuqa3A4z1nkyStwyirXAZJzunuvxS4EjjKUrG/vdvsOuDOaYWUJA131vBN2ALsS7KJpb8A7qiqu5M8AnwmyV8A3wRumWJOSdIQQwu9qg4Dl64y/n2W9qdLkuaAZ4pKUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRoyyDl2aX4uLo41JZwBn6JLUCAtdkhphoUtSIyx0SWqEB0XVHg+U6gzlDF2SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEYMLfQkFyY5kOSRJA8nubEbX0zyeJJD3e3N048rSVrLKCcWnQTeW1XfSPIK4KEk+7vnbq6qv5pePEnSqIYWelUdA451959OchS4YNrBJEnrs6596Em2A5cCD3ZDNyQ5nOTWJOf2nE2StA4jF3qSlwOfA95TVU8BnwR+BdjB0gz+42u8bk+Sg0kOnjhxoofIkqTVjFToSTazVOa3V9XnAarqiap6rqqeBz4F7FzttVW1t6oWqmphMBj0lVuStMIoq1wC3AIcrapPLBvfsmyztwFH+o8nSRrVKKtcLgeuBb6d5FA39gHgmiQ7gAJ+ALxrKgklSSMZZZXLV4Cs8tQ9/ceRJI3LM0UlqRF+Y5HOCIv3L846gjR1ztAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJT/6UxrXU5gcVdq49L0+YMXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRgwt9CQXJjmQ5JEkDye5sRt/ZZL9Sb7b/Tx3+nElSWsZZYZ+EnhvVV0CXAa8O8klwPuA+6rqIuC+7rEkaUaGFnpVHauqb3T3nwaOAhcAVwH7us32AVdPK6Qkabh17UNPsh24FHgQOL+qjnVP/Rg4v9dkkqR1GbnQk7wc+Bzwnqp6avlzVVVArfG6PUkOJjl44sSJicJKktY2UqEn2cxSmd9eVZ/vhp9IsqV7fgtwfLXXVtXeqlqoqoXBYNBHZknSKkZZ5RLgFuBoVX1i2VN3Add1968D7uw/niRpVKNcPvdy4Frg20kOdWMfAD4K3JHkeuCHwO9NJ6IkaRRDC72qvgJkjaff0G8cSdK4PFNUkhrhNxbpjLDrtvtfNHb/O3ZteA5pmpyhS1IjLHRJaoSFLkmNsNAlqREeFNXpY3Fx1gmkueYMXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoTXcpFWWLx/8cVju148Js0bZ+iS1IihhZ7k1iTHkxxZNraY5PEkh7rbm6cbU5I0zCgz9NuA3auM31xVO7rbPf3GkiSt19BCr6oHgJ9uQBZJ0gQm2Yd+Q5LD3S6Zc3tLJEkay7irXD4JfBio7ufHgXeutmGSPcAegG3bto35cdJsrbbyZT3bukpGG2GsGXpVPVFVz1XV88CngJ2n2HZvVS1U1cJgMBg3pyRpiLEKPcmWZQ/fBhxZa1tJ0sYYusslyaeBXcB5SR4DPgTsSrKDpV0uPwDeNcWMkqQRDC30qrpmleFbppBFkjQBT/2XNsBaB1U9WKo+eeq/JDXCQpekRljoktQIC12SGmGhS1IjXOUizdColxRwNYxG4QxdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhNdy0f9bXBx/rO/PlbRuztAlqRFDCz3JrUmOJzmybOyVSfYn+W7389zpxpQkDTPKDP02YPeKsfcB91XVRcB93WNJ0gwNLfSqegD46Yrhq4B93f19wNU955IkrdO4B0XPr6pj3f0fA+evtWGSPcAegG3bto35cdKZbdQvwljz9X5Bxhlh4oOiVVVAneL5vVW1UFULg8Fg0o+TJK1h3EJ/IskWgO7n8f4iSZLGMW6h3wVc192/DriznziSpHGNsmzx08BXgYuTPJbkeuCjwJVJvgu8sXssSZqhoQdFq+qaNZ56Q89ZJEkT8NR/zd5GXF5AOgN46r8kNcJCl6RGWOiS1AgLXZIaYaFLUiNc5TIPRlnRMcmqj0lWkcxqtYmrXDbEateIWe26L2tdS8ZrxMwXZ+iS1AgLXZIaYaFLUiMsdElqhIUuSY1wlcuZylUkWsOk346k2XGGLkmNsNAlqREWuiQ1wkKXpEZ4UHSa/OIGzYmNPNA56uUE1D9n6JLUiIlm6El+ADwNPAecrKqFPkJJktavj10uv1lVP+nhfSRJE3CXiyQ1YtJCL+BLSR5KsqePQJKk8Uy6y+U3qurxJL8I7E/yaFU9sHyDruj3AGzbtm3Cj5syV6VIU+EXZGyMiWboVfV49/M48AVg5yrb7K2qhapaGAwGk3ycJOkUxi70JC9L8ooX7gNvAo70FUyStD6T7HI5H/hCkhfe55+q6l97SSVJWrexC72qvg+8rscskqQJuGxRkhpx+lzLZZ5WoMwiyzz990unmTNllY0zdElqhIUuSY2w0CWpERa6JDXCQpekRpw+q1zmnStOpHUb9ZuUprUapbVvV3KGLkmNsNAlqREWuiQ1wkKXpEacuQdFRz2I2ffBzj7fzwOxmrFRD2pupHnMtFGcoUtSIyx0SWqEhS5JjbDQJakRFrokNeL0XuUyq5UqkubCNFa0rOc9V7tMwCy/TMMZuiQ1YqJCT7I7yXeSfC/J+/oKJUlav7ELPckm4G+B3wIuAa5JcklfwSRJ6zPJDH0n8L2q+n5V/S/wGeCqfmJJktZrkkK/APjRssePdWOSpBlIVY33wuTtwO6q+qPu8bXAr1fVDSu22wPs6R5eDHxn/LhTdR7wk1mHGGLeM857PjBjX8zYj1Ez/nJVDYZtNMmyxceBC5c93tqN/Yyq2gvsneBzNkSSg1W1MOscpzLvGec9H5ixL2bsR98ZJ9nl8nXgoiSvSvIS4PeBu/qJJUlar7Fn6FV1MskNwL3AJuDWqnq4t2SSpHWZ6EzRqroHuKenLLM297uFmP+M854PzNgXM/aj14xjHxSVJM0XT/2XpEZY6Msk+XCSw0kOJflSkl+adablknwsyaNdxi8kOWfWmVZK8rtJHk7yfJK5WmEw75eqSHJrkuNJjsw6y2qSXJjkQJJHuv/HN84600pJzk7ytSTf6jLeNOtMa0myKck3k9zd13ta6D/rY1X12qraAdwN/PmsA62wH3hNVb0W+A/g/TPOs5ojwO8AD8w6yHKnyaUqbgN2zzrEKZwE3ltVlwCXAe+ew1/DZ4Erqup1wA5gd5LLZpxpLTcCR/t8Qwt9map6atnDlwFzdYChqr5UVSe7h//O0tr/uVJVR6tqHk8em/tLVVTVA8BPZ51jLVV1rKq+0d1/mqUymquzw2vJM93Dzd1trv4cAyTZCvw28Pd9vq+FvkKSjyT5EfAHzN8Mfbl3Al+cdYjTiJeq6FGS7cClwIOzTfJi3a6MQ8BxYH9VzV1G4K+BPwWe7/NNz7hCT/JvSY6scrsKoKo+WFUXArcDN5z63TY+X7fNB1n65+/tG51v1IxqV5KXA58D3rPiX7Vzoaqe63abbgV2JnnNrDMtl+QtwPGqeqjv9z69v7FoDFX1xhE3vZ2lNfYfmmKcFxmWL8k7gLcAb6gZrTldx6/hPBnpUhU6tSSbWSrz26vq87POcypV9WSSAywdl5inA82XA29N8mbgbOAXkvxjVf3hpG98xs3QTyXJRcseXgU8Oqssq0mym6V/pr21qv5n1nlOM16qYkJJAtwCHK2qT8w6z2qSDF5Y/ZXkpcCVzNmf46p6f1VtrartLP0+/HIfZQ4W+kof7XYdHAbexNJR6HnyN8ArgP3d0sq/m3WglZK8LcljwOuBf0ly76wzwdKlKljahXYvSwfz7pi3S1Uk+TTwVeDiJI8luX7WmVa4HLgWuKL7/Xeom2XOky3Age7P8NdZ2ofe27LAeeeZopLUCGfoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEb8H34J5hL8mzEZAAAAAElFTkSuQmCC\n",
          "text/plain": [
           "<Figure size 432x288 with 1 Axes>"
          ]
         },
         "metadata": {
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "products = xor[\"x\"] * xor[\"y\"]\n",
        "\n",
        "features_class_true = products[xor[\"label\"]]\n",
        "features_class_false = products[~xor[\"label\"]]\n",
        "\n",
        "plt.hist(features_class_true,  bins=30, color=\"g\", alpha=.5, histtype=\"stepfilled\")\n",
        "plt.hist(features_class_false,  bins=30, color=\"r\", alpha=.5, histtype=\"stepfilled\");"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In this case a simple classifier would just introduce a threshold of 0 to distinguish both classes."
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "### Other examples of feature engineering\n",
        "\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
    
        "Feature engineering requires understanding your data to extract meaningful and discriminative (?) information.\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "Proper feature engineering can boost the performance of a classifier significantly.\n",
        "\n",
        "Examples:\n",
        "\n",
    
        "- ~~nudity classifier~~: color histograms of full image and image patches\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "\n",
        "- spam classifier: choice of dictionary, extra feature which counts words only in capital cases or words with unusual characters (like \"pill$\")\n",
        "\n",
        "\n",
        "- to distinguish background noise from speach audio samples, the frequency distribution might help. Also std deviation  or a histogram of loudness / energy of a sample might help.\n",
        "\n",
        "\n",
    
    schmittu's avatar
    schmittu committed
        "- to classify DNA sequences, n-gram histograms (n>=1) can be benefitial.\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "\n",
    
    schmittu's avatar
    schmittu committed
        "- for geopolitical data a feature \"state\"  can be enhanced by \"political system\" and / or \"gross national product (GNP)\".\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "\n",
        "- for sales data add a feature \"is week day\".\n",
        "\n",
        "\n",
    
        "Most cases are beyond the 2- or 3D case and visual inspection can be difficult. Thus engineering features as we did in the 2D examples becomes tricky. But here are some general recommendations:\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "- use statistics (mean, std deviations, higher order features) as well as histograms if applicable.\n",
        "\n",
    
        "- polynomial features (e.g. extend `x, y` to `x, y, x * y, x ** 2, y ** 2`) (see examples section).\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "- image classification: dig into *computer vision* to learn about image descriptors.\n",
        "\n",
        "- audio classification: learn about FFT, wavelets, filter banks, power spectrum, ...\n",
        "\n",
        "- try to incorporate external data.\n",
        "\n",
        "*Comment*: \n",
        "\n",
    
        "We will see later that adding too many features can introduce other problems (-> *overfitting*) but there are also methods for feature selection in this case (see https://scikit-learn.org/stable/modules/feature_selection.html)"
    
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Examples below only discuss the two classes case !?\n",
        "\n",
        "\n",
        "The following examples in this script will only consider two class problems. \n",
        "Before we dig deeper into classification, we want to say a few words on how can we handle more than two classes. \n",
        "\n",
        "\n",
        "The general idea for `n > 2` classes is to build multiple 2-class classifiers and determine a winner:\n",
        "\n",
        "- the **one-vs-all** approach builds `n` classifiers for \"label n vs. the rest\". \n",
        "\n",
        "\n",
        "- the **one-vs-one** approach builds  classifiers for `label i vs label j` (in total `n x (n - 1) / 2` classifiers).\n",
        "\n",
        "For new incoming data then the `n` resp. `n x n` classifiers are applied and the overall winner class is the final result.\n",
        "\n",
        "For the digit classificaton example:\n",
        "\n",
        "- we could build 10 classifiers `is it 0 or one of the others`, `is it 1 one or one of the others`, etc.\n",
        "  \n",
        "  A new image then would hopefully yield `True` for exactly one of the classifier, in other situations the result is unclear.\n",
        "   \n",
        "   \n",
        "- we could build 45 classifiers `is it 0 or 1`, `is it 0 or 2`, etc.\n",
        "\n",
        "  For a new image we could choose the final outcome based on which of the classifiers \"wins\" most often.\n",
        "\n",
        "\n",
        "#### Note:\n",
        "In `scikit-learn` many classifiers support such multi-class problems out of the box and also offers functionalities to implement `one-vs-all` or `one-vs-one` for specific cases. See https://scikit-learn.org/stable/modules/multiclass.html"
       ]
      },
    
    schmittu's avatar
    schmittu committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    schmittu's avatar
    schmittu committed
        "## Exercise section 2"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "To prepare the next bigger exercise, we quickly introduce how to add so called polynomial features to our data:"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 54,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "          0         1         2         3         4\n",
          "0 -1.539782  0.950822  2.370928 -1.464059  0.904063\n",
          "1  0.436266 -1.768324  0.190328 -0.771460  3.126968\n",
          "2 -1.466436  1.391890  2.150435 -2.041118  1.937358\n",
          "3 -1.037642 -0.953587  1.076700  0.989482  0.909329\n",
          "4 -0.691444 -0.219826  0.478094  0.151997  0.048323\n",
          "5  1.436550 -0.046027  2.063676 -0.066121  0.002119\n",
          "6  0.664361 -1.234410  0.441375 -0.820094  1.523768\n",
          "7  0.164649 -1.848453  0.027109 -0.304346  3.416779\n",
          "8 -1.883945 -0.222088  3.549248  0.418402  0.049323\n",
          "9  0.934993 -1.081893  0.874212 -1.011563  1.170493\n"
         ]
        }
       ],
       "source": [
        "from sklearn.preprocessing import PolynomialFeatures\n",
        "\n",
        "df = pd.read_csv(\"xor.csv\")\n",
        "features = df.iloc[:10, :-1]\n",
        "preproc = PolynomialFeatures(2, include_bias=False)\n",
        "data = preproc.fit_transform(features)\n",
        "print(pd.DataFrame(data))"
    
    schmittu's avatar
    schmittu committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    schmittu's avatar
    schmittu committed
        "In this case \n",
        "- columns 0 and 1 are $x$ and $y$ from the original data set.\n",
        "- column 2 is $x^2$\n",
        "- column 3 is $x y$\n",
        "- column 4 is $y^2$.\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
    
    schmittu's avatar
    schmittu committed
        "A complete description can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html"
    
    schmittu's avatar
    schmittu committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    schmittu's avatar
    schmittu committed
        "The following script now learns classifiers on different data sets and plots decision surfaces."
    
    schmittu's avatar
    schmittu committed
       ]
      },
      {
       "cell_type": "code",
    
    schmittu's avatar
    schmittu committed
       "execution_count": null,
    
    schmittu's avatar
    schmittu committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
    
    schmittu's avatar
    schmittu committed
          "113 out of 200 predicted correctly\n"
    
    schmittu's avatar
    schmittu committed
         ]
        },
        {
         "data": {
    
    schmittu's avatar
    schmittu committed
          "image/png": "\n",
    
    schmittu's avatar
    schmittu committed
          "text/plain": [
    
    schmittu's avatar
    schmittu committed
           "<Figure size 432x432 with 1 Axes>"
    
    schmittu's avatar
    schmittu committed
          ]
         },
         "metadata": {
          "needs_background": "light"
         },
         "output_type": "display_data"
        }
       ],
       "source": [
        "from sklearn.linear_model import LogisticRegression\n",
        "from sklearn.svm import LinearSVC, SVC\n",
        "from sklearn.preprocessing import PolynomialFeatures\n",
        "from sklearn.tree import DecisionTreeClassifier\n",
        "from sklearn.neighbors import KNeighborsClassifier\n",
        "\n",
        "\n",
        "def train_and_plot_decision_surface(clf, preproc, features, labels, marker=\".\", N=400):\n",
        "    \n",
        "    features = np.array(features)\n",
        "    xmin, ymin = features.min(axis=0)\n",
        "    xmax, ymax = features.max(axis=0)\n",
        "    \n",
        "    x = np.linspace(xmin, xmax, N)\n",
        "    y = np.linspace(ymin, ymax, N) \n",
        "    points = np.array(np.meshgrid(x, y)).T.reshape(-1, 2)\n",
        "  \n",
        "    if preproc is not None:\n",
        "        points_for_clf = preproc.fit_transform(points)\n",
        "        features = preproc.fit_transform(features)\n",
        "    else:\n",
        "        points_for_clf = points\n",
        "    \n",
        "    clf.fit(features, labels)\n",
        "    predicted = clf.predict(features)\n",
        "    print(sum(predicted == labels), \"out of\", len(labels), \"predicted correctly\")\n",
        "    classes = np.array(clf.predict(points_for_clf), dtype=bool) \n",
    
    schmittu's avatar
    schmittu committed
        "    plt.plot(points[classes][:, 0], points[classes][:, 1], \"b\" + marker, markersize=1, alpha=.05);\n",
    
    schmittu's avatar
    schmittu committed
        "    plt.plot(points[~classes][:, 0], points[~classes][:, 1], \"r\" + marker, markersize=1, alpha=.05);\n",
        "\n",
        "\n",
        "df = pd.read_csv(\"2d_points.csv\")\n",
    
    schmittu's avatar
    schmittu committed
        "# df = pd.read_csv(\"xor.csv\")\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "features = df.iloc[:, :-1]\n",
        "labels = df.iloc[:, -1]\n",
        "\n",
        "plt.figure(figsize=(6, 6));\n",
        "\n",
        "clf = LinearSVC()\n",
        "# clf = LogisticRegression()\n",
        "# clf = SVC(gamma=.1)\n",
        "# clf = DecisionTreeClassifier(max_depth=6)\n",
        "# clf = KNeighborsClassifier(10)\n",
        "\n",
    
    schmittu's avatar
    schmittu committed
        "#preproc = PolynomialFeatures(2, include_bias=False)\n",
        "preproc = None\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
        "train_and_plot_decision_surface(clf, preproc, features, labels)\n",
        "\n",
    
    schmittu's avatar
    schmittu committed
        "colors = [\"rb\"[i] for i in labels]\n",
    
    schmittu's avatar
    schmittu committed
        "plt.scatter(features.iloc[:, 0], features.iloc[:, 1], color=colors, marker='.');"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "- modify the script to use the `xor.csv` data set.\n",
        "\n",
    
    schmittu's avatar
    schmittu committed
        "- play with the other classifiers which are outcommented in the script.\n",
        "- play with their parameters.\n",
        "- activate the feature engineering step and experiment with classifiers and their parameters.\n",
        ""
    
    schmittu's avatar
    schmittu committed
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 369,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "/Users/uweschmitt/Projects/machinelearning-introduction-workshop/venv3.6/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: get_ipython_dir has moved to the IPython.paths module since IPython 4.0.\n",
          "  if __name__ == '__main__':\n"
         ]
        },
        {
         "data": {
          "text/html": [
           "<style>\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
           "    @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Lato');\n",
           "    @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
           "\n",
           "    \n",
           "    body {\n",
           "        font-family: 'Lora', Consolas, sans-serif;\n",
           "       \n",
           "        -webkit-print-color-adjust: exact important !;\n",
           "        \n",
           "      \n",
           "       \n",
           "    }\n",
           "    .rendered_html code\n",
           "    {\n",
           "        color: black;\n",
           "        background: #eaf0ff;\n",
           "        background: #f5f5f5; \n",
           "        padding: 1pt;\n",
           "        font-family:  'Source Code Pro', Consolas, monocco, monospace;\n",
           "    }\n",
           "    \n",
           "    p {\n",
           "      line-height: 140%;\n",
           "    }\n",
           "    \n",
           "    strong code {\n",
           "        background: red;\n",
           "    }\n",
           "    \n",
           "    em  {\n",
           "        color: green;\n",
           "    }\n",
           "    \n",
           "    .rendered_html strong code\n",
           "    {\n",
           "        background: #f5f5f5;\n",
           "    }\n",
           "    \n",
           "    .CodeMirror pre {\n",
           "    font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
           "    }\n",
           "    \n",
           "    .cm-s-ipython span.cm-keyword {\n",
           "        font-weight: normal;\n",
           "     }\n",
           "     \n",
           "     strong {\n",
           "         background: #f5f5f5;\n",
           "         margin-top: 4pt;\n",
           "         margin-bottom: 4pt;\n",
           "         padding: 2pt;\n",
           "         border: 0.5px solid #a0a0a0;\n",
           "         font-weight: bold;\n",
           "         color: darkred;\n",
           "     }\n",
           "     \n",
           "    \n",
           "    div #notebook {\n",
           "        # font-size: 10pt; \n",
           "        line-height: 145%;\n",
           "        }\n",
           "        \n",
           "    li {\n",
           "        line-height: 145%;\n",
           "    }\n",
           "\n",
           "    div.output_area pre {\n",
           "        background: #fff9d8 !important;\n",
           "        padding: 5pt;\n",
           "       \n",
           "       -webkit-print-color-adjust: exact; \n",
           "        \n",
           "    }\n",
           " \n",
           "    \n",
           " \n",
           "    h1, h2, h3, h4 {\n",
           "        font-family: Kameron, arial;\n",
           "    }\n",
           "    \n",
           "    div#maintoolbar {display: none !important;}\n",
           "    </style>"
          ],
          "text/plain": [
           "<IPython.core.display.HTML object>"
          ]
         },
         "execution_count": 369,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "#REMOVEBEGIN\n",
        "# THE LINES BELOW ARE JUST FOR STYLING THE CONTENT ABOVE !\n",
        "\n",
        "from IPython import utils\n",
        "from IPython.core.display import HTML\n",
        "import os\n",
        "def css_styling():\n",
        "    \"\"\"Load default custom.css file from ipython profile\"\"\"\n",
        "    base = utils.path.get_ipython_dir()\n",
        "    styles = \"\"\"<style>\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
        "    @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Lato');\n",
        "    @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
        "\n",
        "    \n",
        "    body {\n",
        "        font-family: 'Lora', Consolas, sans-serif;\n",
        "       \n",
        "        -webkit-print-color-adjust: exact important !;\n",
        "        \n",
        "      \n",
        "       \n",
        "    }\n",
        "    .rendered_html code\n",
        "    {\n",
        "        color: black;\n",
        "        background: #eaf0ff;\n",
        "        background: #f5f5f5; \n",
        "        padding: 1pt;\n",
        "        font-family:  'Source Code Pro', Consolas, monocco, monospace;\n",
        "    }\n",
        "    \n",
        "    p {\n",
        "      line-height: 140%;\n",
        "    }\n",
        "    \n",
        "    strong code {\n",
        "        background: red;\n",
        "    }\n",
        "    \n",
        "    em  {\n",
        "        color: green;\n",
        "    }\n",
        "    \n",
        "    .rendered_html strong code\n",
        "    {\n",
        "        background: #f5f5f5;\n",
        "    }\n",
        "    \n",
        "    .CodeMirror pre {\n",
        "    font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
        "    }\n",
        "    \n",
        "    .cm-s-ipython span.cm-keyword {\n",
        "        font-weight: normal;\n",
        "     }\n",
        "     \n",
        "     strong {\n",
        "         background: #f5f5f5;\n",
        "         margin-top: 4pt;\n",
        "         margin-bottom: 4pt;\n",
        "         padding: 2pt;\n",
        "         border: 0.5px solid #a0a0a0;\n",
        "         font-weight: bold;\n",
        "         color: darkred;\n",
        "     }\n",
        "     \n",
        "    \n",
        "    div #notebook {\n",
        "        # font-size: 10pt; \n",
        "        line-height: 145%;\n",
        "        }\n",
        "        \n",
        "    li {\n",
        "        line-height: 145%;\n",
        "    }\n",
        "\n",
        "    div.output_area pre {\n",
        "        background: #fff9d8 !important;\n",
        "        padding: 5pt;\n",
        "       \n",
        "       -webkit-print-color-adjust: exact; \n",
        "        \n",
        "    }\n",
        " \n",
        "    \n",
        " \n",
        "    h1, h2, h3, h4 {\n",
        "        font-family: Kameron, arial;\n",
        "    }\n",
        "    \n",
        "    div#maintoolbar {display: none !important;}\n",
        "    </style>\"\"\"\n",
        "    return HTML(styles)\n",
        "css_styling()\n",
        "#REMOVEEND"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": []
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": "Python 3",
       "language": "python",
       "name": "python3"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
    
    schmittu's avatar
    schmittu committed
      }
     },
     "nbformat": 4,
     "nbformat_minor": 2
    }