diff --git a/03_overfitting_and_cross_validation.ipynb b/03_overfitting_and_cross_validation.ipynb
index 290636a960d180eac4c128bf9ccffb190f82bd5c..dedacd89ffcc5bd97e6f5883f160a8ee30e8618b 100644
--- a/03_overfitting_and_cross_validation.ipynb
+++ b/03_overfitting_and_cross_validation.ipynb
@@ -612,8 +612,13 @@
     "\n",
     "\n",
     "## 2. How can we do better ?\n",
-    "\n",
-    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "There is no classifier which works out of the box in all situations. Depending on the \"geometry\" / \"shape\" of the data, classification algorithms and their settings can make a big difference.\n",
     "\n",
     "In our previous 2D examples we were able to visualize the data and classification results, this is not possible for higher dimensional data.\n",
@@ -621,8 +626,54 @@
     "The general way to handle this situation is as follows: \n",
     "\n",
     "- split our data into a learning data set and a test data set\n",
+    "\n",
+    "\n",
     "- train the classifier on the learning data set\n",
-    "- assess performance of the classifier on the test data set."
+    "\n",
+    "\n",
+    "- assess performance of the classifier on the test data set.\n",
+    "\n",
+    "\n",
+    "### Cross-validation\n",
+    "\n",
+    "<img src=\"https://i.imgflip.com/305azk.jpg\" title=\"made at imgflip.com\" width=40%/>\n",
+    "\n",
+    "\n",
+    "The procedure called *cross-validation* goes a step further: In this procedure the full dataset is split into learn-/test-set in various ways and statistics of the achieved metrics is computed to assess the classifier.\n",
+    "\n",
+    "A common approach is **K-fold cross-validation**:\n",
+    "\n",
+    "K-fold cross-validation has an advantage that we do not leave out part of our data from training. This is useful when we do not have a lot of data. \n",
+    "\n",
+    "### Example: 4-fold cross validation\n",
+    "\n",
+    "For 4-fold cross validation we split our data set into four equal sized partitions P1, P2, P3 and P4.\n",
+    "\n",
+    "We:\n",
+    "\n",
+    "- hold out `P1`: train the classifier on `P2 + P3 + P4`, compute accuracy `m1` on `P1`.\n",
+    "\n",
+    "<img src=\"cross_val_0.svg?2\" />\n",
+    "\n",
+    "-  hold out `P2`: train the classifier on `P1 + P3 + P4`, compute accuracy `m2` on `P2`.\n",
+    "\n",
+    "<img src=\"cross_val_1.svg?2\" />\n",
+    "\n",
+    "-  hold out `P3`: train the classifier on `P1 + P2 + P4`, compute accuray `m3` on `P3`.\n",
+    "\n",
+    "<img src=\"cross_val_2.svg?2\" />\n",
+    "\n",
+    "-  hold out `P4`: train the classifier on `P1 + P2 + P3`, compute accuracy `m4` on `P4`.\n",
+    "\n",
+    "<img src=\"cross_val_3.svg?2\" />\n",
+    "\n",
+    "Finally we can compute the average of `m1` .. `m4` as the final measure for accuracy.\n",
+    "\n",
+    "Some advice:\n",
+    "\n",
+    "- This can be done on the original data or on randomly shuffled data. It is recommended to shuffle the data first, as there might be some unknown underlying ordering in your dataset\n",
+    "\n",
+    "- Usually one uses 3- to 10-fold cross validation, depending on the amount of data available."
    ]
   },
   {