01_introduction.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
       "    @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Lato');\n",
       "    @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
       "    \n",
       "    @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
       "\n",
       "    \n",
       "    body {\n",
       "        font-family: 'Lora', Consolas, sans-serif;\n",
       "       \n",
       "        -webkit-print-color-adjust: exact important !;\n",
       "        \n",
       "      \n",
       "       \n",
       "    }\n",
       "    \n",
       "    .alert-block {\n",
       "        width: 95%;\n",
       "        margin: auto;\n",
       "    }\n",
       "    \n",
       "    .rendered_html code\n",
       "    {\n",
       "        color: black;\n",
       "        background: #eaf0ff;\n",
       "        background: #f5f5f5; \n",
       "        padding: 1pt;\n",
       "        font-family:  'Source Code Pro', Consolas, monocco, monospace;\n",
       "    }\n",
       "    \n",
       "    p {\n",
       "      line-height: 140%;\n",
       "    }\n",
       "    \n",
       "    strong code {\n",
       "        background: red;\n",
       "    }\n",
       "    \n",
       "    .rendered_html strong code\n",
       "    {\n",
       "        background: #f5f5f5;\n",
       "    }\n",
       "    \n",
       "    .CodeMirror pre {\n",
       "    font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
       "    }\n",
       "    \n",
       "    .cm-s-ipython span.cm-keyword {\n",
       "        font-weight: normal;\n",
       "     }\n",
       "     \n",
       "     strong {\n",
       "         background: #f5f5f5;\n",
       "         margin-top: 4pt;\n",
       "         margin-bottom: 4pt;\n",
       "         padding: 2pt;\n",
       "         border: 0.5px solid #a0a0a0;\n",
       "         font-weight: bold;\n",
       "         color: darkred;\n",
       "     }\n",
       "     \n",
       "    \n",
       "    div #notebook {\n",
       "        # font-size: 10pt; \n",
       "        line-height: 145%;\n",
       "        }\n",
       "        \n",
       "    li {\n",
       "        line-height: 145%;\n",
       "    }\n",
       "\n",
       "    div.output_area pre {\n",
       "        background: #fff9d8 !important;\n",
       "        padding: 5pt;\n",
       "       \n",
       "       -webkit-print-color-adjust: exact; \n",
       "        \n",
       "    }\n",
       " \n",
       "    \n",
       " \n",
       "    h1, h2, h3, h4 {\n",
       "        font-family: Kameron, arial;\n",
       "\n",
       "\n",
       "    }\n",
       "    \n",
       "    div#maintoolbar {display: none !important;}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore', category=FutureWarning)\n",
    "warnings.filterwarnings = lambda *a, **kw: None\n",
    "from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 1: General Introduction to machine learning (ML)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ML = \"learning models from data\"\n",
    "\n",
    "\n",
    "### About models\n",
    "\n",
    "A \"model\" allows us to explain observations and to answer questions. For example:\n",
    "\n",
    "   1. Where will my car at given velocity stop if I apply break now?\n",
    "   2. Where on the night sky will I see the moon tonight?\n",
    "   3. Is the email I received spam?\n",
    "   4. Which article \"X\" should I recommend to a customer \"Y\"?\n",
    "   \n",
    "- The first two questions can be answered based on existing physical models (formulas). \n",
    "\n",
    "- For the  questions 3 and 4 it is difficult to develop explicitly formulated models. \n",
    "\n",
    "### What is needed to apply ML ?\n",
    "\n",
    "Problems 3 and 4 have the following in common:\n",
    "\n",
    "- No exact model known or implementable because we have a vague understanding of the problem domain.\n",
    "- But enough data with sufficient and implicit information is available.\n",
    "\n",
    "\n",
    "\n",
    "E.g. for the spam email example:\n",
    "\n",
    "- We have no explicit formula for such a task (and devising one would boil down to lots of trial with different statistics or scores and possibly weighting of them).\n",
    "- We have a vague understanding of the problem domain because we know that some words are specific to spam emails and others are specific to my personal and work-related emails.\n",
    "- My mailbox is full with examples of both spam and non-spam emails.\n",
    "\n",
    "**In such cases machine learning offers approaches to build models based on example data.**\n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<i class=\"fa fa-info-circle\"></i>\n",
    "The closely-related concept of <strong>data mining</strong> usually means use of predictive machine learning models to explicitly discover previously unknown knowledge from a specific data set, such as, for instance, association rules between customer and article types in the Problem 4 above.\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "## ML: what is \"learning\" ?\n",
    "\n",
    "To create a predictive model, we must first **train** such a model on given data. \n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<i class=\"fa fa-info-circle\"></i>\n",
    "Alternative names for \"to train\" a model are \"to <strong>fit</strong>\" or \"to <strong>learn</strong>\" a model.\n",
    "</div>\n",
    "\n",
    "\n",
    "All ML algorithms have in common that they rely on internal data structures and/or parameters. Learning then builds up such data structures or adjusts parameters based on the given data. After that such models can be used to explain observations or to answer questions.\n",
    "\n",
    "The important difference between explicit models and models learned from data:\n",
    "\n",
    "- Explicit models usually offer exact answers to questions\n",
    "- Models we learn from data usually come with inherent uncertainty."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",