diff --git a/02_classification.ipynb b/02_classification.ipynb
index 084de68edc93f8bf70d1f2635a675deb9a21c216..fe541200c35791eb4b3c973958b58405f7e4e8fe 100644
--- a/02_classification.ipynb
+++ b/02_classification.ipynb
@@ -1928,7 +1928,7 @@
     }
    ],
    "source": [
-    "# SOLUTION\n",
+    "#SOLUTION\n",
     "try_dataset(\"2d_points.csv\", PolynomialFeatures(2, include_bias=False))"
    ]
   }
diff --git a/03_overfitting_and_cross_validation.ipynb b/03_overfitting_and_cross_validation.ipynb
index 5ed8ef0e03bf60fc2fc0671c1973943d3d26fd57..b0c66271a391528b398c5595cd56c7102b89893a 100644
--- a/03_overfitting_and_cross_validation.ipynb
+++ b/03_overfitting_and_cross_validation.ipynb
@@ -614,50 +614,9 @@
     "\n",
     "- split our data into a learning data set and a test data set\n",
     "- train the classifier on the learning data set\n",
-    "- assess performance of the classifier on the test data set."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Cross-validation\n",
-    "\n",
-    "The procedure called *cross-validation* goes a step further: In this procedure the full dataset is split into learn-/test-set in various ways and statistics of the achieved metrics is computed to assess the classifier.\n",
-    "\n",
-    "A common approach is **K-fold cross-validation**:\n",
-    "\n",
-    "K-fold cross-validation has an advantage that we do not leave out part of our data from training. This is useful when we do not have a lot of data. \n",
-    "\n",
-    "### Example: 4-fold cross validation\n",
-    "\n",
-    "For 4-fold cross validation we split our data set into four equal sized partitions P1, P2, P3 and P4.\n",
-    "\n",
-    "We:\n",
-    "\n",
-    "- train the classifier on `P2 + P3 + P4`, compute accuracy `m1` on `P1`.\n",
-    "\n",
-    "<img src=\"cross_val_0.svg?2\" />\n",
-    "\n",
-    "- train the classifier on `P1 + P3 + P4`, compute accuracy `m2` on `P2`.\n",
-    "\n",
-    "<img src=\"cross_val_1.svg?2\" />\n",
-    "\n",
-    "- train the classifier on `P1 + P2 + P4`, compute accuray `m3` on `P3`.\n",
-    "\n",
-    "<img src=\"cross_val_2.svg?2\" />\n",
-    "\n",
-    "- train the classifier on `P1 + P2 + P3`, compute accuracy `m4` on `P4`.\n",
-    "\n",
-    "<img src=\"cross_val_3.svg?2\" />\n",
-    "\n",
-    "Finally we can compute the average of `m1` .. `m4` as the final measure for accuracy.\n",
-    "\n",
-    "Some advice:\n",
-    "\n",
-    "- This can be done on the original data or on randomly shuffled data. It is recommended to shuffle the data first, as there might be some unknown underlying ordering in your dataset\n",
+    "- assess performance of the classifier on the test data set.\n",
     "\n",
-    "- Usually one uses 3- to 10-fold cross validation, depending on the amount of data available."
+    "**TODO**: test_train_split intro"
    ]
   },
   {
diff --git a/04_measuring_quality_of_a_classifier.ipynb b/04_measuring_quality_of_a_classifier.ipynb
index df35c485ef8f5aa4263a507dd0cd982a867380f6..e0cea68f6bdb2f33735b9078d618514a41a1267f 100644
--- a/04_measuring_quality_of_a_classifier.ipynb
+++ b/04_measuring_quality_of_a_classifier.ipynb
@@ -169,7 +169,7 @@
     "\n",
     "After applying a classifier to a data set with known labels `0` and `1`:\n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "\n",
     "<h3><i class=\"fa fa-info-circle\"></i>&nbsp;Definition</h3>\n",
     "<ul>\n",
@@ -241,7 +241,7 @@
     "\n",
     "\n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "<h3><i class=\"fa fa-info-circle\"></i>&nbsp;Definition</h3>\n",
     "\n",
     "This allows us to define <strong>accuracy</strong> as (<code>TP</code> + <code>TN</code>) / (<code>TP</code> + <code>FP</code> + <code>FN</code> + <code>TN</code>).\n",
@@ -395,7 +395,7 @@
     "To transfer this concept to classification, we can interpret a classifier as a filter. The classifier classifies every  document in a collection as relevant or not relevant.\n",
     "\n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "\n",
     "<h3><i class=\"fa fa-info-circle\"></i>&nbsp;Definition</h3>\n",
     "\n",
@@ -438,7 +438,7 @@
     "Sometimes we want a single number instead of two numbers to compare the performace of multiple classifiers.\n",
     "\n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "<h3><i class=\"fa fa-info-circle\"></i>&nbsp;Definition</h3>\n",
     "    \n",
     "The **F1 score** is computed as\n",
diff --git a/06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb b/06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb
index 08829103ec087ed89961c28d80b1b68d6cfdf494..d3b0918c71cbe46d715af7e74ed704c431c3a507 100644
--- a/06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb
+++ b/06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb
@@ -554,7 +554,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "\n",
     "<i class=\"fa fa-info-circle\"></i>&nbsp;The benefit of using a pipeline is that you will  not mistakenly scale the full data set first, instead we follow the strategy we described above automatically.\n",
     "\n",
diff --git a/07_regression.ipynb b/07_regression.ipynb
index 87aa3896882c073fab39d8f82144ac341975c94b..11a4d35be94934a1173b316d60b89f8c02d4f7f7 100644
--- a/07_regression.ipynb
+++ b/07_regression.ipynb
@@ -132,7 +132,7 @@
     "\n",
     "Regression belongs like classification to the field of supervised learning. \n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "<i class=\"fa fa-info-circle\"></i>&nbsp; \n",
     "<strong>Regression predicts numerical values</strong> \n",
     "in contrast to classification which predicts categories.\n",
@@ -383,17 +383,106 @@
    "source": [
     "In contrast to our previous examples, our data set contains a non-numerical text column `kind`.\n",
     "\n",
-    "<div class=\"alert alert-block alert-warning\">\n",
-    "<i class=\"fa fa-info-circle\"></i>&nbsp; \n",
-    "    <code>sklearn.preprocessing.LabelEncoder</code> is a preprocessor which encodes text values to according categorical numbers.\n",
-    "</div>\n",
-    "\n",
+    "<code>sklearn.preprocessing.LabelEncoder</code> is a preprocessor which encodes text values to according numbers:\n",
     "\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>circumference</th>\n",
+       "      <th>length</th>\n",
+       "      <th>kind</th>\n",
+       "      <th>weight</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>25.5</td>\n",
+       "      <td>85.5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>31.2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>22.5</td>\n",
+       "      <td>62.5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>12.4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>29.0</td>\n",
+       "      <td>88.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>34.8</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>32.5</td>\n",
+       "      <td>85.5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>62.7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>24.5</td>\n",
+       "      <td>74.5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>24.2</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   circumference  length  kind  weight\n",
+       "0           25.5    85.5     0    31.2\n",
+       "1           22.5    62.5     0    12.4\n",
+       "2           29.0    88.0     0    34.8\n",
+       "3           32.5    85.5     0    62.7\n",
+       "4           24.5    74.5     0    24.2"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "\n",
+    "df.iloc[:, 2] = LabelEncoder().fit_transform(df.iloc[:, 2]) \n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
    "metadata": {},
    "outputs": [
     {
@@ -472,16 +561,12 @@
        "99           27.5    86.5     1    43.4"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 18,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "from sklearn.preprocessing import LabelEncoder\n",
-    "\n",
-    "df.iloc[:, 2] = LabelEncoder().fit_transform(df.iloc[:, 2]) \n",
-    "\n",
     "df.tail()"
    ]
   },
@@ -691,7 +776,7 @@
     "\n",
     "This is the metric we used before. Taking absolute values before adding up the deviatons assures that deviations with different signs can not cancel out.\n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "    <i class=\"fa fa-info-circle\"></i>&nbsp; <strong>mean absolute error</strong> is defined as \n",
     "\n",
     "$$\n",
@@ -711,7 +796,7 @@
     "Here we replace the absolute difference by its squared difference. Squaring also insures positive differeces.\n",
     "\n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "    <i class=\"fa fa-info-circle\"></i>&nbsp; <strong>mean squared error</strong> is defined as \n",
     "\n",
     "\n",
@@ -731,7 +816,7 @@
     "\n",
     "Here we replace mean calculation by median. \n",
     "\n",
-    "<div class=\"alert alert-block alert-info\">\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
     "    <i class=\"fa fa-info-circle\"></i>&nbsp; <strong>median absolute error</strong> is defined as \n",
     "\n",
     "\n",
@@ -918,6 +1003,7 @@
     }
    ],
    "source": [
+    "warnings.filterwarnings('ignore', category=DeprecationWarning)\n",
     "from sklearn.model_selection import GridSearchCV\n",
     "\n",
     "search = GridSearchCV(p, param_grid, scoring=\"neg_median_absolute_error\", cv=4, n_jobs=4)\n",