Skip to content
Snippets Groups Projects
01_introduction.ipynb 208 KiB
Newer Older
  • Learn to ignore specific revisions
  • schmittu's avatar
    schmittu committed
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "225 examples\n",
    
          "187 labeled correctly\n"
    
    schmittu's avatar
    schmittu committed
         ]
        }
       ],
       "source": [
        "print(len(labels), \"examples\")\n",
        "print(sum(predicted_labels == labels), \"labeled correctly\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "<div class=\"alert alert-block alert-info\">\n",
        "<i class=\"fa fa-info-circle\"></i>\n",
        "<code>predicted_labels == labels</code> evaluates to a vector of <code>True</code> or <code>False</code> Boolean values. When used as numbers, Python handles <code>True</code> as <code>1</code> and <code>False</code> as <code>0</code>. So, <code>sum(...)</code> simply counts the correctly predicted labels.\n",
        "</div>\n",
        "\n"
    
    schmittu's avatar
    schmittu committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## What happened ?\n",
        "\n",
        "Why were not not all labels where predicted correctly ?\n",
        "\n",
        "Neither `Python` nor `scikit-learn` is broken. What we observed above is very typical for machine-learning applications.\n",
        "\n",
        "The reason here is that we have incomplete information: other features of beer which also contribute to the rating (like \"maltiness\") where not measured or can not be measured. So even the best algorithm can not predict the target values reliably.\n",
        "\n",
        "Another reason might be mistakes in the input data, e.g. some labels are assigned incorrectly.\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
    
    schmittu's avatar
    schmittu committed
        "* Finding good features is crucial for the performance of ML algorithms !\n",
        "\n",
        "\n",
        "* Another important issue is make sure that you have clean data: input-features might be corrupted by flawed entries, feeding such data into a ML algorithm will usually lead to reduced performance."
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "# Exercise section 2"
    
    schmittu's avatar
    schmittu committed
       ]
      },
      {
    
       "cell_type": "markdown",
    
    schmittu's avatar
    schmittu committed
       "metadata": {},
    
    schmittu's avatar
    schmittu committed
       "source": [
    
        "<div class=\"alert alert-block alert-danger\">\n",
        "<strong>TODO:</strong> I propose to start excercise session 2 here, and ask to do re-classification with SVM, and only then play w/ regularization param.\n",
        "</div>"
    
    schmittu's avatar
    schmittu committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "Now, train a different `scikit-learn` classifier - the so called **Support Vector Classifier** `SVC`, and evaluate its \"re-classification\" performance again.\n",
        "\n",
        "<div class=\"alert alert-block alert-info\">\n",
        "<i class=\"fa fa-info-circle\"></i>\n",
        "<code>SVC</code>  belongs to a class of algorithms named \"Support Vector Machines\" (SVMs). We will discuss available ML algorithms in more detail in the following scripts.\n",
        "</div>"
    
    schmittu's avatar
    schmittu committed
       ]
    
    schmittu's avatar
    schmittu committed
      },
      {
       "cell_type": "code",
    
       "execution_count": 15,
    
    schmittu's avatar
    schmittu committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
    
          "225 examples\n",
          "205 labeled correctly\n"
    
    schmittu's avatar
    schmittu committed
         ]
        }
       ],
       "source": [
    
        "from sklearn.svm import SVC\n",
        "# ...\n",
        "# REMOVE the following lines in the target script\n",
        "classifier = SVC()\n",
        "classifier.fit(input_features, labels)\n",
        "\n",
        "predicted_labels = classifier.predict(input_features)\n",
        "\n",
        "assert(predicted_labels.shape == labels.shape)\n",
        "print(len(labels), \"examples\")\n",
        "print(sum(predicted_labels == labels), \"labeled correctly\")"
    
    schmittu's avatar
    schmittu committed
       ]
      },
    
    schmittu's avatar
    schmittu committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
    schmittu's avatar
    schmittu committed
        "\n",
    
        "<div class=\"alert alert-block alert-info\">\n",
        "<i class=\"fa fa-info-circle\"></i>\n",
        "Better re-classification does not indicate here that <code>SVC</code> is better than <code>LogisticRegression</code>. At most it seems to fit better to our training data. We will learn later that this may actually not necessarily be a good thing.\n",
        "</div>\n",
    
    schmittu's avatar
    schmittu committed
        "\n",
    
        "Note that both `LogisticRegression` and `SVC` classifiers have a parameter `C` which allows to enforce simplification (know also as regularization) of the resulting model. Test the beers data \"re-classification\" with different values of this parameter."
    
    schmittu's avatar
    schmittu committed
       ]
      },
    
    schmittu's avatar
    schmittu committed
      {
       "cell_type": "code",
    
       "execution_count": 16,
    
    schmittu's avatar
    schmittu committed
       "metadata": {},
    
       "outputs": [],
       "source": [
        "?LogisticRegression\n",
        "# ..."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 17,
       "metadata": {
        "collapsed": true
       },
    
    schmittu's avatar
    schmittu committed
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
    
          "/Users/mikolajr/Workspace/SSDM/machinelearning-introduction-workshop/.venv/lib/python3.7/site-packages/ipykernel_launcher.py:9: UserWarning: get_ipython_dir has moved to the IPython.paths module since IPython 4.0.\n",
    
    schmittu's avatar
    schmittu committed
          "  if __name__ == '__main__':\n"
         ]
        },
        {
         "data": {
          "text/html": [
           "<style>\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
           "    @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Lato');\n",
           "    @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
           "    \n",
           "    @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
           "\n",
           "    \n",
           "    body {\n",
           "        font-family: 'Lora', Consolas, sans-serif;\n",
           "       \n",
           "        -webkit-print-color-adjust: exact important !;\n",
    
    schmittu's avatar
    schmittu committed
           "        \n",
           "      \n",
    
    schmittu's avatar
    schmittu committed
           "       \n",
           "    }\n",
           "    .rendered_html code\n",
           "    {\n",
           "        color: black;\n",
           "        background: #eaf0ff;\n",
    
    schmittu's avatar
    schmittu committed
           "        background: #f5f5f5; \n",
    
    schmittu's avatar
    schmittu committed
           "        padding: 1pt;\n",
           "        font-family:  'Source Code Pro', Consolas, monocco, monospace;\n",
           "    }\n",
           "    \n",
    
    schmittu's avatar
    schmittu committed
           "    p {\n",
           "      line-height: 140%;\n",
           "    }\n",
           "    \n",
           "    strong code {\n",
           "        background: red;\n",
           "    }\n",
           "    \n",
           "    .rendered_html strong code\n",
           "    {\n",
           "        background: #f5f5f5;\n",
           "    }\n",
           "    \n",
    
    schmittu's avatar
    schmittu committed
           "    .CodeMirror pre {\n",
           "    font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
           "    }\n",
           "    \n",
           "    .cm-s-ipython span.cm-keyword {\n",
           "        font-weight: normal;\n",
           "     }\n",
           "     \n",
           "     strong {\n",
    
    schmittu's avatar
    schmittu committed
           "         background: #f5f5f5;\n",
           "         margin-top: 4pt;\n",
           "         margin-bottom: 4pt;\n",
           "         padding: 2pt;\n",
           "         border: 0.5px solid #a0a0a0;\n",
           "         font-weight: bold;\n",
           "         color: darkred;\n",
    
    schmittu's avatar
    schmittu committed
           "     }\n",
           "     \n",
           "    \n",
           "    div #notebook {\n",
           "        # font-size: 10pt; \n",
           "        line-height: 145%;\n",
           "        }\n",
           "        \n",
           "    li {\n",
    
    schmittu's avatar
    schmittu committed
           "        line-height: 145%;\n",
    
    schmittu's avatar
    schmittu committed
           "    }\n",
           "\n",
           "    div.output_area pre {\n",
           "        background: #fff9d8 !important;\n",
           "        padding: 5pt;\n",
           "       \n",
           "       -webkit-print-color-adjust: exact; \n",
           "        \n",
           "    }\n",
           " \n",
           "    \n",
           " \n",
           "    h1, h2, h3, h4 {\n",
           "        font-family: Kameron, arial;\n",
           "    }\n",
           "    \n",
           "    div#maintoolbar {display: none !important;}\n",
           "    </style>"
          ],
          "text/plain": [
           "<IPython.core.display.HTML object>"
          ]
         },
    
         "execution_count": 17,
    
    schmittu's avatar
    schmittu committed
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "#REMOVEBEGIN\n",
        "# THE LINES BELOW ARE JUST FOR STYLING THE CONTENT ABOVE !\n",
        "\n",
        "from IPython import utils\n",
        "from IPython.core.display import HTML\n",
        "import os\n",
        "def css_styling():\n",
        "    \"\"\"Load default custom.css file from ipython profile\"\"\"\n",
        "    base = utils.path.get_ipython_dir()\n",
        "    styles = \"\"\"<style>\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
        "    @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Lato');\n",
        "    @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
        "    \n",
        "    @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
        "\n",
        "    \n",
        "    body {\n",
        "        font-family: 'Lora', Consolas, sans-serif;\n",
        "       \n",
        "        -webkit-print-color-adjust: exact important !;\n",
    
    schmittu's avatar
    schmittu committed
        "        \n",
        "      \n",
    
    schmittu's avatar
    schmittu committed
        "       \n",
        "    }\n",
        "    .rendered_html code\n",
        "    {\n",
        "        color: black;\n",
        "        background: #eaf0ff;\n",
    
    schmittu's avatar
    schmittu committed
        "        background: #f5f5f5; \n",
    
    schmittu's avatar
    schmittu committed
        "        padding: 1pt;\n",
        "        font-family:  'Source Code Pro', Consolas, monocco, monospace;\n",
        "    }\n",
        "    \n",
    
    schmittu's avatar
    schmittu committed
        "    p {\n",
        "      line-height: 140%;\n",
        "    }\n",
        "    \n",
        "    strong code {\n",
        "        background: red;\n",
        "    }\n",
        "    \n",
        "    .rendered_html strong code\n",
        "    {\n",
        "        background: #f5f5f5;\n",
        "    }\n",
        "    \n",
    
    schmittu's avatar
    schmittu committed
        "    .CodeMirror pre {\n",
        "    font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
        "    }\n",
        "    \n",
        "    .cm-s-ipython span.cm-keyword {\n",
        "        font-weight: normal;\n",
        "     }\n",
        "     \n",
        "     strong {\n",
    
    schmittu's avatar
    schmittu committed
        "         background: #f5f5f5;\n",
        "         margin-top: 4pt;\n",
        "         margin-bottom: 4pt;\n",
        "         padding: 2pt;\n",
        "         border: 0.5px solid #a0a0a0;\n",
        "         font-weight: bold;\n",
        "         color: darkred;\n",
    
    schmittu's avatar
    schmittu committed
        "     }\n",
        "     \n",
        "    \n",
        "    div #notebook {\n",
        "        # font-size: 10pt; \n",
        "        line-height: 145%;\n",
        "        }\n",
        "        \n",
        "    li {\n",
    
    schmittu's avatar
    schmittu committed
        "        line-height: 145%;\n",
    
    schmittu's avatar
    schmittu committed
        "    }\n",
        "\n",
        "    div.output_area pre {\n",
        "        background: #fff9d8 !important;\n",
        "        padding: 5pt;\n",
        "       \n",
        "       -webkit-print-color-adjust: exact; \n",
        "        \n",
        "    }\n",
        " \n",
        "    \n",
        " \n",
        "    h1, h2, h3, h4 {\n",
        "        font-family: Kameron, arial;\n",
        "    }\n",
        "    \n",
        "    div#maintoolbar {display: none !important;}\n",
        "    </style>\"\"\"\n",
        "    return HTML(styles)\n",
        "css_styling()\n",
        "#REMOVEEND"
       ]
    
    schmittu's avatar
    schmittu committed
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": "Python 3",
       "language": "python",
       "name": "python3"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
    
       "version": "3.7.1"
    
    schmittu's avatar
    schmittu committed
      }
     },
     "nbformat": 4,
     "nbformat_minor": 2
    }