Skip to content
Snippets Groups Projects
01_introduction.ipynb 738 KiB
Newer Older
schmittu's avatar
schmittu committed
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
       "    @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Lato');\n",
       "    @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
       "\n",
       "    \n",
       "    body {\n",
       "        font-family: 'Lora', Consolas, sans-serif;\n",
       "       \n",
       "        -webkit-print-color-adjust: exact important !;\n",
       "        \n",
       "      \n",
       "       \n",
       "    }\n",
       "    \n",
       "    .alert-block {\n",
       "        width: 95%;\n",
       "        margin: auto;\n",
       "    }\n",
       "    \n",
       "    .rendered_html code\n",
       "    {\n",
       "        color: black;\n",
       "        background: #eaf0ff;\n",
       "        background: #f5f5f5; \n",
       "        padding: 1pt;\n",
       "        font-family:  'Source Code Pro', Consolas, monocco, monospace;\n",
       "    }\n",
       "    \n",
       "    p {\n",
       "      line-height: 140%;\n",
       "    }\n",
       "    \n",
       "    strong code {\n",
       "        background: red;\n",
       "    }\n",
       "    \n",
       "    .rendered_html strong code\n",
       "    {\n",
       "        background: #f5f5f5;\n",
       "    }\n",
       "    \n",
       "    .CodeMirror pre {\n",
       "    font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
       "    }\n",
       "    \n",
       "    .cm-s-ipython span.cm-keyword {\n",
       "        font-weight: normal;\n",
       "     }\n",
       "     \n",
       "     strong {\n",
       "         background: #f5f5f5;\n",
       "         margin-top: 4pt;\n",
       "         margin-bottom: 4pt;\n",
       "         padding: 2pt;\n",
       "         border: 0.5px solid #a0a0a0;\n",
       "         font-weight: bold;\n",
       "         color: darkred;\n",
       "     }\n",
       "     \n",
       "    \n",
       "    div #notebook {\n",
       "        # font-size: 10pt; \n",
       "        line-height: 145%;\n",
       "        }\n",
       "        \n",
       "    li {\n",
       "        line-height: 145%;\n",
       "    }\n",
       "\n",
       "    div.output_area pre {\n",
       "        background: #fff9d8 !important;\n",
       "        padding: 5pt;\n",
       "       \n",
       "       -webkit-print-color-adjust: exact; \n",
       "        \n",
       "    }\n",
       " \n",
       "    \n",
       " \n",
       "    h1, h2, h3, h4 {\n",
       "        font-family: Kameron, arial;\n",
       "\n",
       "\n",
       "    }\n",
       "    \n",
       "    div#maintoolbar {display: none !important;}\n",
schmittu's avatar
schmittu committed
       "</style>\n",
       "    <script>\n",
       "IPython.OutputArea.prototype._should_scroll = function(lines) {\n",
       "        return false;\n",
       "}\n",
       "    </script>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
schmittu's avatar
schmittu committed
    "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore', category=FutureWarning)\n",
schmittu's avatar
schmittu committed
    "warnings.filterwarnings = lambda *a, **kw: None\n",
    "from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
   ]
  },
schmittu's avatar
schmittu committed
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
schmittu's avatar
schmittu committed
    "# Chapter 1: General Introduction to machine learning (ML)"
schmittu's avatar
schmittu committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
schmittu's avatar
schmittu committed
    "## ML = \"learning models from data\"\n",
schmittu's avatar
schmittu committed
    "\n",
schmittu's avatar
schmittu committed
    "\n",
schmittu's avatar
schmittu committed
    "### About models\n",
schmittu's avatar
schmittu committed
    "\n",
schmittu's avatar
schmittu committed
    "A \"model\" allows us to explain observations and to answer questions. For example:\n",
    "\n",
    "   1. Where will my car at given velocity stop if I apply break now?\n",
    "   2. Where on the night sky will I see the moon tonight?\n",
    "   3. Is the email I received spam?\n",
schmittu's avatar
schmittu committed
    "   4. What product should I recommend my customer `X` ?\n",
schmittu's avatar
schmittu committed
    "   \n",
schmittu's avatar
schmittu committed
    "- The first two questions can be answered based on existing physical models (formulas). \n",
    "\n",
    "- For the  questions 3 and 4 it is difficult to develop explicitly formulated models. \n",
schmittu's avatar
schmittu committed
    "\n",
schmittu's avatar
schmittu committed
    "### What is needed to apply ML ?\n",
    "\n",
    "\n",
schmittu's avatar
schmittu committed
    "- We have no explicit formula for such a task.\n",
schmittu's avatar
schmittu committed
    "\n",
schmittu's avatar
schmittu committed
    "- We have a vague understanding of the problem domain, e.g. we know that some words are specific to spam emails and others are specific to my personal and work-related emails.\n",
schmittu's avatar
schmittu committed
    "\n",
schmittu's avatar
schmittu committed
    "- We have enough example data, as my mailbox is full of both spam and non-spam emails.\n",
    "\n",
    "\n",
    "We could handcraft a personal spam classifier by hard coding rules, like _\"mail contains 'no prescription' and comes from russia or china\"_, plus some statistics. This would be very tedious.\n",
schmittu's avatar
schmittu committed
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<i class=\"fa fa-info-circle\"></i>\n",
    "    Systems with such hard coded rules are called <strong>expert systems</strong>\n",
    "</div>\n",
    "\n",
    "In such cases machine learning is a better approach.\n",
schmittu's avatar
schmittu committed
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<i class=\"fa fa-info-circle\"></i>\n",
    "<strong>Machine learning</strong> offers approaches to automatically build predictive models based on example data.\n",
    "</div>\n",
schmittu's avatar
schmittu committed
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<i class=\"fa fa-info-circle\"></i>\n",
    "The closely-related concept of <strong>data mining</strong> usually means use of predictive machine learning models to explicitly discover previously unknown knowledge from a specific data set, such as, for instance, association rules between customer and article types in the Problem 4 above.\n",
schmittu's avatar
schmittu committed
    "\n",
    "\n",
    "\n",
    "## ML: what is \"learning\" ?\n",
    "\n",
    "To create a predictive model, we must first **train** such a model on given data. \n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<i class=\"fa fa-info-circle\"></i>\n",
    "Alternative names for \"to train\" a model are \"to <strong>fit</strong>\" or \"to <strong>learn</strong>\" a model.\n",
    "</div>\n",
schmittu's avatar
schmittu committed
    "\n",
    "All ML algorithms have in common that they rely on internal data structures and/or parameters.\n",
Loading
Loading full blame...