diff --git a/01_introduction.json b/01_introduction.json
deleted file mode 100644
index decba45ad37fe7ab857b4db618c9534038a10583..0000000000000000000000000000000000000000
--- a/01_introduction.json
+++ /dev/null
@@ -1,707 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Chapter 1: General Introduction to machine learning (ML)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## ML = \"learning models from data\"\n",
-    "\n",
-    "\n",
-    "### About models\n",
-    "\n",
-    "A \"model\" allows us to explain observations and to answer questions. For example:\n",
-    "\n",
-    "   1. Where will my car at given velocity stop if I apply break now?\n",
-    "   2. Where on the night sky will I see the moon tonight?\n",
-    "   2. Is the email I received spam?\n",
-    "   4. Which article "X" should I recommend to a customer "Y"?\n",
-    "   \n",
-    "- The first two questions can be answered based on existing physical models (formulas). \n",
-    "\n",
-    "- For the  questions 3 and 4 it is difficult to develop explicitly formulated models. \n",
-    "\n",
-    "### What is needed to apply ML ?\n",
-    "\n",
-    "Problems 3 and 4 have the following in common:\n",
-    "\n",
-    "- No exact model known or implementable because we have a vague understanding of the problem domain.\n",
-    "- But enough data with sufficient and implicit information is available.\n",
-    "\n",
-    "\n",
-    "\n",
-    "E.g. for the spam email example:\n",
-    "\n",
-    "- We have no explicit formula for such a task (and devising one would boil down to lots of trial with different statistics or scores and possibly weighting of them).\n",
-    "- We have a vague understanding of the problem domain because we know that some words are specific to spam emails and others are specific to my personal and work-related emails.\n",
-    "- My mailbox is full with examples of both spam and non-spam emails.\n",
-    "\n",
-    "**In such cases machine learning offers approaches to build models based on example data.**\n",
-    "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
-    "<i class=\"fa fa-info-circle\"></i>\n",
-    "The closely-related concept of <strong>data mining</strong> usually means use of predictive machine learning models to explicitly discover previously unknown knowledge from a specific data set, such as, for instance, association rules between customer and article types in the Problem 4 above.\n",
-    "</div>\n",
-    "\n",
-    "\n",
-    "\n",
-    "## ML: what is \"learning\" ?\n",
-    "\n",
-    "To create a predictive model, we must first **train** such a model on given data. \n",
-    "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
-    "<i class=\"fa fa-info-circle\"></i>\n",
-    "Alternative names for \"to train\" a model are \"to <strong>fit</strong>\" or \"to <strong>learn</strong>\" a model.\n",
-    "</div>\n",
-    "\n",
-    "\n",
-    "All ML algorithms have in common that they rely on internal data structures and/or parameters. Learning then builds up such data structures or adjusts parameters based on the given data. After that such models can be used to explain observations or to answer questions.\n",
-    "\n",
-    "The important difference between explicit models and models learned from data:\n",
-    "\n",
-    "- Explicit models usually offer exact answers to questions\n",
-    "- Models we learn from data usually come with inherent uncertainty."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "\n",
-    "## Some history\n",
-    "\n",
-    "Some parts of ML are older than you might think. This is a rough time line with a few selected achievements from this field:\n",
-    "\n",
-    "    1805: Least squares regression\n",
-    "    1812: Bayes' rule\n",
-    "    1913: Markov Chains\n",
-    "\n",
-    "    1951: First neural network\n",
-    "    1957-65: \"k-means\" clustering algorithm\n",
-    "    1959: Term \"machine learning\" is coined by Arthur Samuel, an AI pioneer\n",
-    "    1969: Book \"Perceptrons\": Limitations of Neural Networks\n",
-    "    1984: Book \"Classification And Regression Trees\"\n",
-    "    1974-86: Neural networks learning breakthrough: backpropagation method\n",
-    "    1995: Randomized Forests and Support Vector Machines methods\n",
-    "    1998: Public appearance: first ML implementations of spam filtering methods; naive Bayes Classifier method\n",
-    "    2006-12: Neural networks learning breakthrough: deep learning\n",
-    "    \n",
-    "So the field is not as new as one might think, but due to \n",
-    "\n",
-    "- more available data\n",
-    "- more processing power \n",
-    "- development of better algorithms \n",
-    "\n",
-    "more applications of machine learning appeared during the last 15 years."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Machine learning with Python\n",
-    "\n",
-    "Currently (2018) `Python` is the  dominant programming language for ML. Especially the advent of deep-learning pushed this forward. First versions of frameworks such as `TensorFlow` or `PyTorch` got early `Python` releases.\n",
-    "\n",
-    "The prevalent packages in the Python eco-system used for ML include:\n",
-    "\n",
-    "- `pandas` for handling tabular data\n",
-    "- `matplotlib` and `seaborn` for plotting\n",
-    "- `scikit-learn` for classical (non-deep-learning) ML\n",
-    "- `TensorFlow`, `PyTorch` and `Keras` for deep-learning.\n",
-    "\n",
-    "`scikit-learn` is very comprehensive and the online-documentation itself provides a good introducion into ML."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## ML lingo: What are \"features\" ?\n",
-    "\n",
-    "A typical and very common situation is that our data is presented as a table, as in the following example:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>alcohol_content</th>\n",
-       "      <th>bitterness</th>\n",
-       "      <th>darkness</th>\n",
-       "      <th>fruitiness</th>\n",
-       "      <th>is_yummy</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>3.739295</td>\n",
-       "      <td>0.422503</td>\n",
-       "      <td>0.989463</td>\n",
-       "      <td>0.215791</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>4.207849</td>\n",
-       "      <td>0.841668</td>\n",
-       "      <td>0.928626</td>\n",
-       "      <td>0.380420</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>4.709494</td>\n",
-       "      <td>0.322037</td>\n",
-       "      <td>5.374682</td>\n",
-       "      <td>0.145231</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>4.684743</td>\n",
-       "      <td>0.434315</td>\n",
-       "      <td>4.072805</td>\n",
-       "      <td>0.191321</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>4.148710</td>\n",
-       "      <td>0.570586</td>\n",
-       "      <td>1.461568</td>\n",
-       "      <td>0.260218</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   alcohol_content  bitterness  darkness  fruitiness  is_yummy\n",
-       "0         3.739295    0.422503  0.989463    0.215791         0\n",
-       "1         4.207849    0.841668  0.928626    0.380420         0\n",
-       "2         4.709494    0.322037  5.374682    0.145231         1\n",
-       "3         4.684743    0.434315  4.072805    0.191321         1\n",
-       "4         4.148710    0.570586  1.461568    0.260218         0"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "features = pd.read_csv(\"beers.csv\")\n",
-    "features.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"alert alert-block alert-warning\">\n",
-    "<i class=\"fa fa-warning\"></i>&nbsp;<strong>Definitions</strong>\n",
-    "<ul>\n",
-    "    <li>every row of such a matrix is called a <strong>sample</strong> or <strong>feature vector</strong>;</li>\n",
-    "    <li>the cells in a row are <strong>feature values</strong>;</li>\n",
-    "    <li>every column name is called a <strong>feature name</strong> or <strong>attribute</strong>.</li>\n",
-    "</ul>\n",
-    "\n",
-    "Features are also commonly called <strong>variables</strong>.\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This table shown holds five samples.\n",
-    "\n",
-    "The feature names are `alcohol_content`, `bitterness`, `darkness`, `fruitiness` and `is_yummy`.\n",
-    "\n",
-    "<div class=\"alert alert-block alert-warning\">\n",
-    "<i class=\"fa fa-warning\"></i>&nbsp;<strong>More definitions</strong>\n",
-    "<ul>\n",
-    "    <li>The first four features have continuous numerical values within some ranges - these are called <strong>numerical features</strong>,</li>\n",
-    "    <li>the <code>is_yummy</code> feature has only a finite set of values (\"categories\"): <code>0</code> (\"no\") and <code>1</code> (\"yes\") - this is called a <strong>categorical feature</strong>.</li>\n",
-    "</ul>\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A straight-forward application of machine-learning on the previous beer dataset is: **\"can we predict `is_yummy` from the other features\"** ?\n",
-    "\n",
-    "<div class=\"alert alert-block alert-warning\">\n",
-    "<i class=\"fa fa-warning\"></i>&nbsp;<strong>Even more definitions</strong>\n",
-    "\n",
-    "In context of the question above we call:\n",
-    "<ul>\n",
-    "    <li>the <code>alcohol_content</code>, <code>bitterness</code>, <code>darkness</code>, <code>fruitiness</code> features our <strong>input features</strong>, and</li>\n",
-    "    <li>the <code>is_yummy</code> feature our <strong>target/output feature</strong> or a <strong>label</strong> of our data samples.\n",
-    "        <ul>\n",
-    "            <li>Values of categorical labels, such as <code>0</code> (\"no\") and <code>1</code> (\"yes\") here, are often called <strong>classes</strong>.</li>\n",
-    "        </ul>\n",
-    "    </li>\n",
-    "</ul>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Most of the machine learning algorithms require that every sample is represented as a vector containing numbers. Let's look now at two examples of how one can create feature vectors from data which is not naturally given as vectors:\n",
-    "\n",
-    "1. Feature vectors from images\n",
-    "2. Feature vectors from text.\n",
-    "\n",
-    "### 1st Example: How to represent images as  feature vectors ?\n",
-    "\n",
-    "In order to simplify our explanations we only consider grayscale images in this section. \n",
-    "Computers represent images as matrices. Every cell in the matrix represents one pixel, and the numerical value in the matrix cell its gray value.\n",
-    "\n",
-    "So how can we represent images as vectors?\n",
-    "\n",
-    "To demonstrate this we will now load a sample dataset that is included in `scikit-learn`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.datasets import load_digits\n",
-    "import matplotlib.pyplot as plt\n",
-    "%matplotlib inline"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['DESCR', 'data', 'images', 'target', 'target_names']\n"
-     ]
-    }
-   ],
-   "source": [
-    "dd = load_digits()\n",
-    "print(dir(dd))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's plot the first ten digits from this data set:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAABHsAAACNCAYAAAAn1Xb5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAGWdJREFUeJzt3X+QXXV5x/HPY4IVCGSXKtAGyhIEq+00S5NxprWaRYn4ozXbIg6iNMtMB0YHJ2mxJZ2xQ6J2DDPVLOOvJiOyabF1jMUNtQw2W1ksztSSmI0UAgysS0kKA9HdBUSJ4NM/7kVSmpjzLPfs2e+T92tmh+zm4bvP2c8959x9cu655u4CAAAAAABADi9rugEAAAAAAAB0DsMeAAAAAACARBj2AAAAAAAAJMKwBwAAAAAAIBGGPQAAAAAAAIkw7AEAAAAAAEiEYQ8AAAAAAEAiDHvazOwkM/uamf3IzB4ys0ua7gkxZnalme0ws2fMbKjpfhBnZr9kZte398EnzWzMzN7edF+IMbMbzewRM3vCzO43sz9puifMnJmdbWY/MbMbm+4FMWY22s7uqfbHfU33hJkxs4vNbE/7eeqDZvbGpntCNQftf89/PGdmn266L8SZWY+Z3WJmk2b2qJl9xszmN90XqjOz15rZN81s2sweMLM/bLqnOjHsecFnJR2QdIqk90n6vJn9RrMtIeh/JH1c0hebbgQzNl/Sw5KWS1oo6SOSvmJmPQ32hLhPSOpx9xMlvUvSx81sacM9YeY+K+nOppvAjF3p7gvaH69puhnEmdkKSddKukzSCZLeJGm80aZQ2UH73wJJp0r6saStDbeFmfmcpMck/YqkXrWer36w0Y5QWXswt03S1yWdJOlySTea2TmNNlYjhj2SzOx4SRdK+it3f8rd75B0s6RLm+0MEe5+k7sPS/pB071gZtz9R+6+zt0n3P1n7v51Sd+XxKCgIO5+t7s/8/yn7Y+zGmwJM2RmF0uakvRvTfcCHMXWS/qou/9H+9y4z933Nd0UZuRCtYYF/950I5iRMyV9xd1/4u6PSrpVEhcHlOPXJf2qpI3u/py7f1PSt5X4d36GPS3nSHrW3e8/6Gu7xc4LNMrMTlFr/7y76V4QY2afM7OnJd0r6RFJtzTcEoLM7ERJH5X0Z033gpfkE2a238y+bWZ9TTeDGDObJ2mZpFe1X3Kwt/3SkWOb7g0zskrS37m7N90IZmRQ0sVmdpyZLZL0drUGPiiXSfrNppuoC8OelgWSnnjR16bVulQWQAPM7BhJX5K0xd3vbbofxLj7B9U6hr5R0k2SnvnF/wfmoI9Jut7d9zbdCGbsakmLJS2StFnSP5sZV9mV5RRJx0h6t1rH015J56r1MmcUxMzOUOtlP1ua7gUz9i21LgZ4QtJeSTskDTfaESLuU+vKuj83s2PM7K1q7ZPHNdtWfRj2tDwl6cQXfe1ESU820Atw1DOzl0n6e7Xuo3Vlw+1ghtqXyN4h6TRJH2i6H1RnZr2Szpe0seleMHPu/h13f9Ldn3H3LWpdrv6OpvtCyI/b//20uz/i7vslfUrkWKJLJd3h7t9vuhHEtZ+b3qrWP2AdL+mVkrrVup8WCuDuP5XUL+mdkh6VdJWkr6g1uEuJYU/L/ZLmm9nZB31tiXjpCDDrzMwkXa/Wv2Ze2D4wo2zzxT17StMnqUfSf5vZo5I+LOlCM/tuk03hJXO1LllHIdx9Uq1fRA5+2Q8vASrTH4urekp2kqRfk/SZ9gD9B5JuEIPXorj799x9ubv/srtfoNbVr//ZdF91Ydij1k1h1ZrSftTMjjezN0haqdaVBSiEmc03s1dImidpnpm9grdDLNLnJb1W0h+4+4+PVIy5xcxObr9F8AIzm2dmF0h6r7jBb2k2qzWg621//K2kf5F0QZNNoToz6zKzC54/F5rZ+9R6FyfuL1GeGyR9qH187Zb0p2q9mwwKYWa/q9bLKXkXrkK1r6r7vqQPtI+pXWrdg+l7zXaGCDP7rfZ58Tgz+7Ba76w21HBbtWHY84IPSjpWrdfx/aOkD7g7V/aU5SNqXe68VtL723/mNe0Fab+e/Qq1frl81Myean+8r+HWUJ2r9ZKtvZImJf2NpDXufnOjXSHE3Z9290ef/1Dr5c4/cffHm+4NlR0j6eOSHpe0X9KHJPW/6M0oUIaPSbpTrSvR90jaJemvG+0IUask3eTu3CKibH8k6W1qHVcfkPRTtYavKMelar1xyGOS3iJpxUHvIJuOcTN4AAAAAACAPLiyBwAAAAAAIBGGPQAAAAAAAIkw7AEAAAAAAEiEYQ8AAAAAAEAitbwttZnVetfn7u7uUP2iRYsq1z7xxBOhtfft2xeqf+6550L1Ue5unVin7gyjzjnnnMq18+fHHtbRDKenp0P1M7Df3V/ViYXmWo4LFiyoXPvqV786tPbTTz8dqr///nrfkKaUffHUU08N1UeOp888E3tzgz179oTq6z6eKvG+OG/evMq1PT09obUffPDBYDf1KmVfjJznJOnAgQOVaycmJoLdzDlp98U6n9/cc8890XZqVcq+ePLJJ4fqI8fT6O8wxx57bKg+el686667ousXsy+efvrpofqurq7Ktfv37w+t/dhjj4Xq+X2x5ayzzgrVR/bFun8PmAWV9sVahj11O//880P1GzZsqFw7MjISWnvt2rWh+snJyVA9WjZv3ly5NnKwlqRrrrkmVL9t27ZQ/Qw8VPc3aMqyZcsq1w4PD4fWHhsbC9X39fWF6rNatWpVqD5yPB0fHw+tHXl8SLNyPE27L55wwgmVaz/5yU+G1u7v74+2A8XOc1JsgDMwMBBrZu5Juy/W+fymt7c32g4kXXLJJaH6SC7R4+OSJUtC9dF/kIwO86emporZF6+66qpQfSSboaGh0NqDg4Oh+qmpqVB9VtHnH5F9McHvAZX2RV7GBQAAAAAAkEilYY+Zvc3M7jOzB8wsdikL5gQyzIEcy0eGOZBj+cgwB3IsHxnmQI7lI8N8jjjsMbN5kj4r6e2SXifpvWb2urobQ+eQYQ7kWD4yzIEcy0eGOZBj+cgwB3IsHxnmVOXKntdLesDdx939gKQvS1pZb1voMDLMgRzLR4Y5kGP5yDAHciwfGeZAjuUjw4SqDHsWSXr4oM/3tr/2f5jZ5Wa2w8x2dKo5dAwZ5kCO5SPDHMixfGSYAzmWjwxzIMfykWFCHXs3LnffLGmzNPfe1hLVkGEO5Fg+MsyBHMtHhjmQY/nIMAdyLB8ZlqXKlT37JJ1+0Oentb+GcpBhDuRYPjLMgRzLR4Y5kGP5yDAHciwfGSZUZdhzp6SzzexMM3u5pIsl3VxvW+gwMsyBHMtHhjmQY/nIMAdyLB8Z5kCO5SPDhI74Mi53f9bMrpT0DUnzJH3R3e+uvTN0DBnmQI7lI8McyLF8ZJgDOZaPDHMgx/KRYU6V7tnj7rdIuqXmXlAjMsyBHMtHhjmQY/nIMAdyLB8Z5kCO5SPDfDp2g+bZtGHDhlD94sWLK9d2d3eH1v7hD38Yqn/Pe94Tqt+6dWuoPqupqanKtcuXLw+tfd5554Xqt23bFqrPrLe3N1R/2223Va6dnp4Ord3T0xOqzyp6fLzoootC9VdccUXl2k2bNoXWXrp0aah+ZGQkVI8XDAwMVK4dGxurrxH8XPQYFjnXrVq1KrT2Qw89FKrn+PuClStj71QcyXH9+vXRdjALIs9R16xZE1o7Wt/V1RWqj/Remuhz1IjIOVSS+vr6aq0vRfRcET2eRrjH7i29e/fuUH2dj7+IKvfsAQAAAAAAQCEY9gAAAAAAACTCsAcAAAAAACARhj0AAAAAAACJMOwBAAAAAABIhGEPAAAAAABAIgx7AAAAAAAAEmHYAwAAAAAAkAjDHgAAAAAAgEQY9gAAAAAAACTCsAcAAAAAACCR+U03IElLly4N1S9evDhUf9ZZZ1WuHR8fD629ffv2UH10W7du3RqqL0Vvb2+ovq+vr55GJI2NjdW2dnb9/f2h+t27d1euHR4eDq19zTXXhOqz2rx5c6j+2muvDdXv2LGjcm30eDoyMhKqxwu6urpC9QMDA5VrBwcHQ2v39PSE6qMmJiZqXb8pU1NTofozzjijcu309HRo7dHR0VB99PEX3daSrF+/vra1o+dFzEz0mBexbt26UH30eFrn8+XSRJ/fR84tkXOoFD/mRXOMHrObEj1XRN1+++2Va6PPJUrdt7iyBwAAAAAAIBGGPQAAAAAAAIkccdhjZqeb2W1mdo+Z3W1mq2ejMXQOGeZAjuUjwxzIsXxkmAM5lo8McyDH8pFhTlXu2fOspKvc/btmdoKknWa23d3vqbk3dA4Z5kCO5SPDHMixfGSYAzmWjwxzIMfykWFCR7yyx90fcffvtv/8pKQ9khbV3Rg6hwxzIMfykWEO5Fg+MsyBHMtHhjmQY/nIMKfQu3GZWY+kcyV95xB/d7mkyzvSFWpDhjmQY/nIMAdyLB8Z5kCO5SPDHMixfGSYR+Vhj5ktkPRPkta4+xMv/nt33yxpc7vWO9YhOoYMcyDH8pFhDuRYPjLMgRzLR4Y5kGP5yDCXSu/GZWbHqBX6l9z9pnpbQh3IMAdyLB8Z5kCO5SPDHMixfGSYAzmWjwzzqfJuXCbpekl73P1T9beETiPDHMixfGSYAzmWjwxzIMfykWEO5Fg+MsypypU9b5B0qaQ3m9lY++MdNfeFziLDHMixfGSYAzmWjwxzIMfykWEO5Fg+MkzoiPfscfc7JNks9IKakGEO5Fg+MsyBHMtHhjmQY/nIMAdyLB8Z5hR6N666dHd3h+p37twZqh8fHw/VR0R7yWrNmjWh+nXr1oXqFy5cGKqPGB0drW3t7AYHB0P1ExMTta29bdu2UH1W0ePd4sWLa6sfGRkJrR09F0xOTobqMxsYGAjV9/T0VK4dGhoKrR3dd6empkL10fNHKSLHR0lasmRJ5droOXRsbCxUH80ws66urlD97t27K9dGc0FLX19frfUR0efLUf39/aH66PG9JNFt27VrV+XayDlUih8jo+eDUtS9XZHH//DwcGjt6LF9rqh0g2YAAAAAAACUgWEPAAAAAABAIgx7AAAAAAAAEmHYAwAAAAAAkAjDHgAAAAAAgEQY9gAAAAAAACTCsAcAAAAAACARhj0AAAAAAACJMOwBAAAAAABIhGEPAAAAAABAIvObbkCSuru7Q/UjIyM1dRIX7X1ycrKmTpo1ODgYqh8aGgrV1/lz6+rqqm3t0kR/FmvWrAnV9/f3h+ojBgYGals7s/Hx8VD9SSedVLl2+/btobWj9StWrAjVl3T8XblyZah+48aNofotW7aE6iNWr14dqr/ssstq6qQs0eNjX19f5dre3t7Q2tHHU1T0OUNJoufRiYmJyrXRc+7w8HBtvZQkul3R/SWyL0ZFjwujo6P1NFKgOp/fL1++PFR/5plnhuqz7otTU1Oh+t27d4fqI8/zrrvuutDa0eNCT09PqL6uzLmyBwAAAAAAIBGGPQAAAAAAAIlUHvaY2Twz22VmX6+zIdSHDHMgx/KRYQ7kWD4yzIEcy0eGOZBj+cgwl8iVPasl7amrEcwKMsyBHMtHhjmQY/nIMAdyLB8Z5kCO5SPDRCoNe8zsNEnvlPSFettBXcgwB3IsHxnmQI7lI8McyLF8ZJgDOZaPDPOpemXPoKS/kPSzwxWY2eVmtsPMdnSkM3QaGeZAjuUjwxzIsXxkmAM5lo8McyDH8pFhMkcc9pjZ70t6zN13/qI6d9/s7svcfVnHukNHkGEO5Fg+MsyBHMtHhjmQY/nIMAdyLB8Z5lTlyp43SHqXmU1I+rKkN5vZjbV2hU4jwxzIsXxkmAM5lo8McyDH8pFhDuRYPjJM6IjDHnf/S3c/zd17JF0s6Zvu/v7aO0PHkGEO5Fg+MsyBHMtHhjmQY/nIMAdyLB8Z5hR5Ny4AAAAAAADMcfMjxe4+Kmm0lk4wK8gwB3IsHxnmQI7lI8McyLF8ZJgDOZaPDPMIDXvqMjk5GapfunRpTZ1I3d3dofpoL1u3bg3Vo369vb2h+rGxsZo6ad66detC9atXr66nEUn9/f2h+qmpqZo6wcEix+sVK1aE1t60aVOo/uqrrw7Vr127NlTfpOnp6VrrV61aVbk2eoyMGh4ernX9rEZHR5tu4ed6enqabmHOmJiYCNUvX768cm1XV1do7Y0bN4bqzz333FB9Kc+HoplEn3+4e21rz6X9vGnRc9Ftt90Wql+/fn3l2ugxL3qeiz5Ooo/xUkQzj9TXffwaHBwM1Uczr4qXcQEAAAAAACTCsAcAAAAAACARhj0AAAAAAACJMOwBAAAAAABIhGEPAAAAAABAIgx7AAAAAAAAEmHYAwAAAAAAkAjDHgAAAAAAgEQY9gAAAAAAACTCsAcAAAAAACARhj0AAAAAAACJzG+6AUkaHx8P1S9dujRUf9FFF9VSOxPXXnttresDL8XQ0FCovq+vL1S/ZMmSyrXDw8Ohtbdt2xaqv+GGG2pdvxQbNmwI1Y+MjFSu7e7uDq19/vnnh+q3bt0aqi/J6OhoqL6rqytU39vbW1svW7ZsCdVPTU2F6rNauXJlqH56erpy7bp164LdxESP15lFz6MbN26sXDsxMRFau6enJ1Tf398fqh8bGwvVl2JwcDBUH9kXb7/99mg7aIs+/iO5SLHco/vWrl27QvUDAwOh+rqP8aWIHJOi+3k0k+jxtC5c2QMAAAAAAJAIwx4AAAAAAIBEKg17zKzLzL5qZvea2R4z+526G0NnkWEO5Fg+MsyBHMtHhjmQY/nIMAdyLB8Z5lP1nj3XSbrV3d9tZi+XdFyNPaEeZJgDOZaPDHMgx/KRYQ7kWD4yzIEcy0eGyRxx2GNmCyW9SdKAJLn7AUkH6m0LnUSGOZBj+cgwB3IsHxnmQI7lI8McyLF8ZJhTlZdxnSnpcUk3mNkuM/uCmR3/4iIzu9zMdpjZjo53iZeKDHMgx/KRYQ7kWD4yzIEcy0eGOZBj+cgwoSrDnvmSflvS5939XEk/krT2xUXuvtndl7n7sg73iJeODHMgx/KRYQ7kWD4yzIEcy0eGOZBj+cgwoSrDnr2S9rr7d9qff1WtBwLKQYY5kGP5yDAHciwfGeZAjuUjwxzIsXxkmNARhz3u/qikh83sNe0vvUXSPbV2hY4iwxzIsXxkmAM5lo8McyDH8pFhDuRYPjLMqeq7cX1I0pfad+Uel3RZfS2hJmSYAzmWjwxzIMfykWEO5Fg+MsyBHMtHhslUGva4+5gkXpdXMDLMgRzLR4Y5kGP5yDAHciwfGeZAjuUjw3yqXtlTq/Hx8VD92rX/715Rv9CGDRsq1+7cuTO09rJl7A8zMTU1Farftm1b5dqVK1eG1u7r6wvVDw0NhepLMjY2Fqrv7e2trX7dunWhtaO5T0xMhOojj8GSTE5Ohuo3bdpUUyfS1q1bQ/VXXHFFTZ3kFzkGL1y4MLR25mNknc4777xQ/erVq2vqRNqyZUuofnR0tJ5GChR9/Pf09FSuHRgYCK0dzWV4eDhUn1X0eeGqVasq10af/+IF0Z9d9PEfeT40PT0dWjv6HHJwcDBUn1X05xD5PaOrqyu0dvS4EP2dqi5VbtAMAAAAAACAQjDsAQAAAAAASIRhDwAAAAAAQCIMewAAAAAAABJh2AMAAAAAAJAIwx4AAAAAAIBEGPYAAAAAAAAkwrAHAAAAAAAgEYY9AAAAAAAAiTDsAQAAAAAASIRhDwAAAAAAQCLm7p1f1OxxSQ+96MuvlLS/499s7mpie89w91d1YqHDZCgdXTk2ta1153g0ZSixL2bAvpgD+2L52BdzYF8sH/tiDuyL5ZvT+2Itw55DfiOzHe6+bFa+2RyQdXuzbtehZN3WrNt1OFm3N+t2HUrWbc26XYeTdXuzbtehZN3WrNt1OFm3N+t2HUrWbc26XYeTdXuzbtehzPVt5WVcAAAAAAAAiTDsAQAAAAAASGQ2hz2bZ/F7zQVZtzfrdh1K1m3Nul2Hk3V7s27XoWTd1qzbdThZtzfrdh1K1m3Nul2Hk3V7s27XoWTd1qzbdThZtzfrdh3KnN7WWbtnDwAAAAAAAOrHy7gAAAAAAAASYdgDAAAAAACQyKwMe8zsbWZ2n5k9YGZrZ+N7NsXMJszsLjMbM7MdTffTKUdThhI5ZkCGOZBj+cgwB3IsHxnmQI7lI8McSsix9nv2mNk8SfdLWiFpr6Q7Jb3X3e+p9Rs3xMwmJC1z9/1N99IpR1uGEjlmQIY5kGP5yDAHciwfGeZAjuUjwxxKyHE2rux5vaQH3H3c3Q9I+rKklbPwfdE5ZJgDOZaPDHMgx/KRYQ7kWD4yzIEcy0eGc9BsDHsWSXr4oM/3tr+WlUv6VzPbaWaXN91MhxxtGUrkmAEZ5kCO5SPDHMixfGSYAzmWjwxzmPM5zm+6gYR+z933mdnJkrab2b3u/q2mm0IYOZaPDHMgx/KRYQ7kWD4yzIEcy0eGOcz5HGfjyp59kk4/6PPT2l9Lyd33tf/7mKSvqXVJW+mOqgwlcsyADHMgx/KRYQ7kWD4yzIEcy0eGOZSQ42wMe+6UdLaZnWlmL5d0saSbZ+H7zjozO97MTnj+z5LeKum/mu2qI46aDCVyzIAMcyDH8pFhDuRYPjLMgRzLR4Y5lJJj7S/jcvdnzexKSd+QNE/SF9397rq/b0NOkfQ1M5NaP9t/cPdbm23ppTvKMpTIMQMyzIEcy0eGOZBj+cgwB3IsHxnmUESOtb/1OgAAAAAAAGbPbLyMCwAAAAAAALOEYQ8AAAAAAEAiDHsAAAAAAAASYdgDAAAAAACQCMMeAAAAAACARBj2AAAAAAAAJMKwBwAAAAAAIJH/BbKiUL0lvDQ5AAAAAElFTkSuQmCC\n",
-      "text/plain": [
-       "<Figure size 1440x360 with 10 Axes>"
-      ]
-     },
-     "metadata": {
-      "needs_background": "light"
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "N = 10\n",
-    "\n",
-    "plt.figure(figsize=(2 * N, 5))\n",
-    "\n",
-    "for i, image in enumerate(dd.images[:N]):\n",
-    "    plt.subplot(1, N, i + 1).set_title(dd.target[i])\n",
-    "    plt.imshow(image, cmap=\"gray\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The data is a set of 8 x 8 matrices with values 0 to 15 (black to white). The range 0 to 15 is fixed for this specific data set. Other formats allow e.g. values 0..255 or floating point values in the range 0 to 1."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "images.ndim: 3\n",
-      "images[0].shape: (8, 8)\n",
-      "images[0]:\n",
-      " [[ 0.  0.  5. 13.  9.  1.  0.  0.]\n",
-      " [ 0.  0. 13. 15. 10. 15.  5.  0.]\n",
-      " [ 0.  3. 15.  2.  0. 11.  8.  0.]\n",
-      " [ 0.  4. 12.  0.  0.  8.  8.  0.]\n",
-      " [ 0.  5.  8.  0.  0.  9.  8.  0.]\n",
-      " [ 0.  4. 11.  0.  1. 12.  7.  0.]\n",
-      " [ 0.  2. 14.  5. 10. 12.  0.  0.]\n",
-      " [ 0.  0.  6. 13. 10.  0.  0.  0.]]\n",
-      "images.shape: (1797, 8, 8)\n",
-      "images.size: 115008\n",
-      "images.dtype: float64\n",
-      "images.itemsize: 8\n",
-      "target.size: 1797\n",
-      "target_names: [0 1 2 3 4 5 6 7 8 9]\n",
-      "DESCR:\n",
-      " Optical Recognition of Handwritten Digits Data Set\n",
-      "===================================================\n",
-      "\n",
-      "Notes\n",
-      "-----\n",
-      "Data Set Characteristics:\n",
-      "    :Number of Instances: 5620\n",
-      "    :Number of Attributes: 64\n",
-      "    :Attribute Information: 8x8 image of integer pixels in the range 0..16.\n",
-      "    :Missing Attribute Values: None\n",
-      "    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)\n",
-      "    :Date: July; 1998\n",
-      "\n",
-      "This is a copy of the test set of the UCI ML hand-written digits datasets\n",
-      "http://archive.ics.uci.edu/ml/datas \n",
-      "[...]\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(\"images.ndim:\", dd.images.ndim) # number of dimensions of the array\n",
-    "print(\"images[0].shape:\", dd.images[0].shape) # dimensions of a first sample array\n",
-    "print(\"images[0]:\\n\", dd.images[0]) # first sample array\n",
-    "print(\"images.shape:\", dd.images.shape) # dimensions of the array of all samples\n",
-    "print(\"images.size:\", dd.images.size) # total number of elements of the array\n",
-    "print(\"images.dtype:\", dd.images.dtype) # type of the elements in the array\n",
-    "print(\"images.itemsize:\", dd.images.itemsize) # size in bytes of each element of the array\n",
-    "print(\"target.size:\", dd.target.size) # size of the target feature vector (labels of samples)\n",
-    "print(\"target_names:\", dd.target_names) # classes vector\n",
-    "print(\"DESCR:\\n\", dd.DESCR[:500], \"\\n[...]\") # description of the dataset"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To transform such an image to a feature vector we just have to flatten the matrix by concatenating the rows to one single vector of size 64:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "image_vector.shape: (64,)\n",
-      "image_vector: [ 0.  0.  5. 13.  9.  1.  0.  0.  0.  0. 13. 15. 10. 15.  5.  0.  0.  3.\n",
-      " 15.  2.  0. 11.  8.  0.  0.  4. 12.  0.  0.  8.  8.  0.  0.  5.  8.  0.\n",
-      "  0.  9.  8.  0.  0.  4. 11.  0.  1. 12.  7.  0.  0.  2. 14.  5. 10. 12.\n",
-      "  0.  0.  0.  0.  6. 13. 10.  0.  0.  0.]\n"
-     ]
-    }
-   ],
-   "source": [
-    "image_vector = dd.images[0].flatten()\n",
-    "print(\"image_vector.shape:\", image_vector.shape)\n",
-    "print(\"image_vector:\", image_vector)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 40,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(1797, 8, 8)\n",
-      "(1797, 64)\n",
-      "[ 0.  0.  5. 13.  9.  1.  0.  0.  0.  0. 13. 15. 10. 15.  5.  0.  0.  3.\n",
-      " 15.  2.  0. 11.  8.  0.  0.  4. 12.  0.  0.  8.  8.  0.  0.  5.  8.  0.\n",
-      "  0.  9.  8.  0.  0.  4. 11.  0.  1. 12.  7.  0.  0.  2. 14.  5. 10. 12.\n",
-      "  0.  0.  0.  0.  6. 13. 10.  0.  0.  0.]\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(dd.images.shape)\n",
-    "\n",
-    "# reashape to 1797, 64:\n",
-    "images_flat = dd.images.reshape(-1, 64)\n",
-    "print(images_flat.shape)\n",
-    "print(images_flat[0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 2nd Example: How to present textual data as feature vectors ?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If we start a machine learning project for texts, we first have to choose a dictionary - set of words for this project. The final representation of a text as a feature vector depends on this dictionary.\n",
-    "\n",
-    "Such a dictionary can be very large, but for the sake of simplicity we use a very small enumerated dictionary to explain the overall procedure:\n",
-    "\n",
-    "\n",
-    "| Word     | Index |\n",
-    "|----------|-------|\n",
-    "| like     | 0     |\n",
-    "| dislike  | 1     |\n",
-    "| american | 2     |\n",
-    "| italian  | 3     |\n",
-    "| beer     | 4     |\n",
-    "| pizza    | 5     |\n",
-    "\n",
-    "To \"vectorize\" a given text we count the words in the text which also exist in the vocabulary and put the counts at the given `Index`.\n",
-    "\n",
-    "E.g. `\"I dislike american pizza, but american beer is nice\"`:\n",
-    "\n",
-    "| Word     | Index | Count |\n",
-    "|----------|-------|-------|\n",
-    "| like     | 0     | 0     |\n",
-    "| dislike  | 1     | 1     |\n",
-    "| american | 2     | 2     |\n",
-    "| italian  | 3     | 0     |\n",
-    "| beer     | 4     | 1     |\n",
-    "| pizza    | 5     | 1     |\n",
-    "\n",
-    "The respective feature vector is the `Count` column, which is:\n",
-    "\n",
-    "`[0, 1, 2, 0, 1, 1]`\n",
-    "\n",
-    "In real case scenarios the dictionary is much bigger, which often results in vectors with only few non-zero entries (so called **sparse vectors**)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Below you find is a short code example to demonstrate how text feature vectors can be created with `scikit-learn`.\n",
-    "<div class=\"alert alert-block alert-info\">\n",
-    "<i class=\"fa fa-info-circle\"></i>\n",
-    "Such vectorization is unsually not done manually. Actually there are improved but more complicated procedures which compute multiplicative weights for the vector entries to emphasize informative words such as, e.g., <a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html\">\"term frequency-inverse document frequency\" vectorizer</a>.\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[0 1 2 0 1 1]\n"
-     ]
-    }
-   ],
-   "source": [
-    "from sklearn.feature_extraction.text import CountVectorizer\n",
-    "from itertools import count\n",
-    "\n",
-    "vocabulary = {\n",
-    "    \"like\": 0,\n",
-    "    \"dislike\": 1,\n",
-    "    \"american\": 2,\n",
-    "    \"italian\": 3,\n",
-    "    \"beer\": 4,\n",
-    "    \"pizza\": 5,\n",
-    "}\n",
-    "\n",
-    "vectorizer = CountVectorizer(vocabulary=vocabulary)\n",
-    "\n",
-    "# this how one can create a count vector for a given piece of text:\n",
-    "vector = vectorizer.fit_transform([\n",
-    "    \"I dislike american pizza. But american beer is nice\"\n",
-    "]).toarray().flatten()\n",
-    "print(vector)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## ML lingo: What are the different types of datasets?\n",
-    "\n",
-    "<div class=\"alert alert-block alert-warning\">\n",
-    "<i class=\"fa fa-warning\"></i>&nbsp;<strong>Definitions</strong>\n",
-    "\n",
-    "Subset of data used for:\n",
-    "<ul>\n",
-    "    <li>learning (training) a model is called a <strong>training set</strong>;</li>\n",
-    "    <li>improving ML method performance by adjusting its parameters is called <strong>validation set</strong>;</li>\n",
-    "    <li>assesing final performance is called <strong>test set</strong>.</li>\n",
-    "</ul>\n",
-    "</div>\n",
-    "\n",
-    "<table>\n",
-    "    <tr>\n",
-    "        <td><img src=\"./data_split.png\" width=300px></td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <td style=\"font-size:75%\"><center>Img source: https://dziganto.github.io</center></td>\n",
-    "    </tr>\n",
-    "</table>\n",
-    "\n",
-    "\n",
-    "You will learn more on how to select wisely subsets of your data and about related issues later in the course. For now just remember that:\n",
-    "1. the training and validation datasets must be disjunct during each iteration of the method improvement, and\n",
-    "1. the test dataset must be independent from the model (hence, from the other datasets), i.e. it is indeed used only for the final assesment of the method's performance (think: locked in the safe until you're done with model tweaking).\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Taxonomy of machine learning\n",
-    "\n",
-    "Most applications of ML belong to two categories: **supervised** and **unsupervised** learning.\n",
-    "\n",
-    "### Supervised learning \n",
-    "\n",
-    "In supervised learning the data comes with an additional target/label value that we want to predict. Such a problem can be either \n",
-    "\n",
-    "- **classification**: we want to predict a categorical value.\n",
-    "    \n",
-    "- **regression**: we want to predict numbers in a given range.\n",
-    "    \n",
-    "  \n",
-    "\n",
-    "Examples of supervised learning:\n",
-    "\n",
-    "- Classification: predict the class `is_yummy`  based on the attributes `alcohol_content`,\t`bitterness`, \t`darkness` and `fruitiness` (a standard two class problem).\n",
-    "\n",
-    "- Classification: predict the digit-shown based on a 8 x 8 pixel image (a multi-class problem).\n",
-    "\n",
-    "- Regression: predict temperature based on how long sun was shining in the last 10 minutes.\n",
-    "\n",
-    "\n",
-    "\n",
-    "<table>\n",
-    "    <tr>\n",
-    "    <td><img src=\"./classification-svc-2d-poly.png\" width=400px></td>\n",
-    "    <td><img src=\"./regression-lin-1d.png\" width=400px></td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <td><center>Classification</center></td>\n",
-    "        <td><center>Linear regression</center></td>\n",
-    "    </tr>\n",
-    "</table>\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Unsupervised learning \n",
-    "\n",
-    "In unsupervised learning the training data consists of samples without any corresponding target/label values and the aim is to find structure in data. Some common applications are:\n",
-    "\n",
-    "- Clustering: find groups in data.\n",
-    "- Density estimation, novelty detection: find a probability distribution in your data.\n",
-    "- Dimension reduction (e.g. PCA): find latent structures in your data.\n",
-    "\n",
-    "Examples of unsupervised learning:\n",
-    "\n",
-    "- Can we split up our beer data set into sub-groups of similar beers?\n",
-    "- Can we reduce our data set because groups of features are somehow correlated?\n",
-    "\n",
-    "<table>\n",
-    "    <tr>\n",
-    "    <td><img src=\"./cluster-image.png/\" width=400px></td>\n",
-    "    <td><img src=\"./nonlin-pca.png/\" width=400px></td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <td><center>Clustering</center></td>\n",
-    "        <td><center>Dimension reduction: detecting 2D structure in 3D data</center></td>\n",
-    "    </tr>\n",
-    "</table>\n",
-    "\n",
-    "\n",
-    "\n",
-    "This course will only introduce concepts and methods from **supervised learning**."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.1"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}