diff --git a/02_classification.ipynb b/02_classification.ipynb
index 0954765bd78660f937fe3a454452459371855101..6761f12d8dcf17c1fb0a87ca3b431e45e47df416 100644
--- a/02_classification.ipynb
+++ b/02_classification.ipynb
@@ -3,6 +3,8 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
     "import matplotlib.pyplot as plt\n",
@@ -12,19 +14,18 @@
     "warnings.filterwarnings('ignore', category=FutureWarning)\n",
     "warnings.filterwarnings = lambda *a, **kw: None\n",
     "from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "# Chapter 2: Classification"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "As we have learned in the previous chapter *classification* is a machine learning problem belonging to a group of *supervised learning* problems. In classification the aim is to learn how to predict a class of a categorical label, based on set of already labelled training examples (hence, supervised). Such labels (categories) and corresponding classes can be:\n",
     "\n",
@@ -73,21 +74,20 @@
     "\n",
     "sns.pairplot(for_plot, hue=\"is_yummy\", diag_kind=\"hist\", diag_kws=dict(alpha=.7));\n",
     "beer_data.describe()"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "We can assume that the person who rated these beers has preferences such as:\n",
     "* \"I don't like too low alcohol content\",\n",
     "* \"I like more fruity beers\", etc."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "This means we could construct a score where high numbers relate to \"favorable beer\". One simple way to implement such a score is to use a weighted sum like:\n",
     "\n",
@@ -96,50 +96,51 @@
     "The actual weights here are guessed and serve as an example.\n",
     "\n",
     "The size of the numbers reflects the numerical ranges of the features: alcohol content is in the range 3 to 5.9, where as bitterness is between 0 and 1.08:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "scores =( 1.1 * beer_data[\"alcohol_content\"] + 4 * beer_data[\"bitterness\"] \n",
     "          + 1.5 * beer_data[\"darkness\"] + 1.8 * beer_data[\"fruitiness\"])\n",
     "scores.shape"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Now we can plot the histogram of the scores by classes:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "scores_bad = scores[beer_data[\"is_yummy\"] == 0]\n",
     "scores_good = scores[beer_data[\"is_yummy\"] == 1]\n",
     "\n",
     "plt.hist(scores_bad, bins=25, color=\"steelblue\", alpha=.7) # alpha makes bars translucent\n",
     "plt.hist(scores_good,  bins=25, color=\"chocolate\", alpha=.7);"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Consequence: a simple classifier could use these scores and use a threshold around 10.5 to assign a class label."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "def classify(beer_feature):\n",
     "    scores = (1.1 * beer_feature[\"alcohol_content\"] + 4 * beer_feature[\"bitterness\"] \n",
@@ -166,11 +167,11 @@
     "<i class=\"fa fa-info-circle\"></i>\n",
     "Although this seems to be a simplistic concept, linear classifiers can actually work very well, especially for problems with many features (high-dimensional problems).\n",
     "</div>\n"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Exercise section 1\n",
     "\n",
@@ -178,12 +179,17 @@
     "\n",
     "\n",
     "- In `scikit-learn` the weights of a trained linear classifier are availble via the `coef_` attribute as a 2 dimensional `numpy` array. Extract the weights from the `LogisticRegression` classifier example from the last script and try them out in your weighted sum scoring function."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {
+    "tags": [
+     "solution"
+    ]
+   },
+   "outputs": [],
    "source": [
     "from sklearn.linear_model import LogisticRegression\n",
     "\n",
@@ -202,17 +208,13 @@
     "\n",
     "plt.hist(scores_bad, bins=25, color=\"steelblue\", alpha=.7) # alpha makes bars translucent\n",
     "plt.hist(scores_good,  bins=25, color=\"chocolate\", alpha=.7);"
-   ],
-   "outputs": [],
-   "metadata": {
-    "scrolled": true,
-    "tags": [
-     "solution"
-    ]
-   }
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
    "source": [
     "## Geometrical interpretation of feature vectors\n",
     "\n",
@@ -221,23 +223,24 @@
     "E.g. if a data set consists of feature vectors of length 2, you can interpret the first feature value as a x-coordinate and the second value as a y-coordinate.\n",
     "\n",
     "Classes then group points."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Example\n",
     "\n",
     "For sake of simplicity we restrict our beer data set to two features: `alcohol_content` and `bitterness`.\n",
     "\n",
     "The following plot shows how these reduced feature vectors can be interpreted as point clouds. For every feature vector we color points in green or red to indicate the according classes:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "xv = beer_data[\"alcohol_content\"]\n",
     "yv = beer_data[\"bitterness\"]\n",
@@ -247,12 +250,11 @@
     "plt.scatter(xv, yv, color=colors, marker='o');\n",
     "plt.xlabel(\"alcohol_content\")\n",
     "plt.ylabel(\"bitterness\");"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "\n",
     "What do we see here ?\n",
@@ -260,11 +262,11 @@
     "1. Both point clouds overlap, this tells us that the two features lack information for a 100% separation of classes. \n",
     "2. We could draw a line to separate most points of both clouds.\n",
     "3. Later we could use this line to make a guess for classifying a new feature vector."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-warning\">\n",
     "<p><i class=\"fa fa-warning\"></i>&nbsp;\n",
@@ -273,19 +275,20 @@
     "</div>\n",
     "\n",
     "<img src=\"./images/303vuc.jpg\" width=50%/>"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Next, we illustrate how more features can support classification. We add the `darkness` feature as a third dimension.\n"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "from mpl_toolkits.mplot3d import Axes3D\n",
     "\n",
@@ -312,34 +315,33 @@
     "ax = fig.add_subplot(122, projection='3d')\n",
     "plot3d(ax)\n",
     "ax.view_init(3, 0);"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "The first view is very similar to the scatter plot before as we don't see the effects of the third feature. \n",
     "\n",
     "The second view shows the same cube rotated by 90Ëš horizontally. We see that the new dimension adds extra information which could improve separation by separating more points.\n",
     "\n",
     "Geometrically, the 1D line, which could separat samples in the previous example that used 2D samples, would be now a 2D plane. It would still look like a line in the first view, but rotating it using the third dimensions could separate more points.\n"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Decision surfaces\n",
     "\n",
     "The concept of decision surfaces is crucial in classification.\n",
     "\n",
     "Lets start with an easy to visualize 2D features space."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "#### Decision lines\n",
     "\n",
@@ -360,11 +362,11 @@
     "      \n",
     "\n",
     "are located on opposite sides of this line. Such a classifier thus determines a line which separates the feature space in two parts according to the two classes."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Lets visualize this! \n",
     "\n",
@@ -376,12 +378,13 @@
     "4. split points according to their score compared to the threshold,\n",
     "5. plot samples in different colors,\n",
     "6. plot decision line.\n"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "import numpy as np\n",
     "\n",
@@ -412,34 +415,34 @@
     "y = threshold / weights[1] - weights[0] / weights[1] * x\n",
     "plt.plot(x, y, color='k', linestyle=':')\n",
     "plt.legend();"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "#### Decision (hyper)plane"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "For 3D samples a linear classifiers separates into classes by a 2D plane, and in general, for `n` dimensions we get `n-1` dimensional hyperplanes."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Let's visualize a decision plane the same way we did visualize the line."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "import numpy as np\n",
     "import matplotlib.pyplot as plt\n",
@@ -491,12 +494,11 @@
     "# for readability, the view angles are chosen so that the 2D separation plane\n",
     "# looks like a line\n",
     "ax.view_init(20, 210);"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Example\n",
     "\n",
@@ -517,11 +519,11 @@
     "are located on different sides of this plane.\n",
     "\n",
     "Again: **Here, the classifier separates the 4D space into two parts; the separation boundary is a 3D hyperplane in this space.**"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-info\">\n",
     "\n",
@@ -534,36 +536,37 @@
     "<p>The examples also might look artificial, this is because they highlight specific aspects or problems. At the end, general classifiers should work on all kind of problems.</p>\n",
     "</div>\n",
     "\n"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Non-linear decision surfaces"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "The next example data set can not be classified by a straight line, the decision line is curved:\n"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "df = pd.read_csv(\"data/circle.csv\")\n",
     "df.head(3)"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "xv = df[\"x\"]\n",
     "yv = df[\"y\"]\n",
@@ -573,21 +576,20 @@
     "plt.xlim([-2, 2])\n",
     "plt.ylim([-2, 2])\n",
     "plt.scatter(xv, yv, color=colors, marker=\"o\");"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "In this case a suitable decision surface is a (closed) curve - it looks like a circle. A hand-crafted classifier could classify new points based on their distance to the center.\n",
     "\n",
     "It should be clear that a **linear classifier is not suitable for this problem**!"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Feature engineering\n",
     "\n",
@@ -600,73 +602,74 @@
     "The general idea is to include / extract usefull information based on domain knowledge. \n",
     "\n",
     "E.g. to classify spam emails you can count the number of words written in capital letters only or group countries and add the group number."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### An example for feature engineering"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "In the previous example we see that the distance of the origin of a point could be used to implement a classifier.\n",
     "\n",
     "Computing the distance of a point to the origin (0, 0) using the euclidian formula includes terms $x^2$ and $y^2$. \n",
     "\n",
     "Let us create a scatter plot for this transformation:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "plt.figure(figsize=(5, 5))\n",
     "plt.scatter(xv ** 2, yv ** 2, color=colors, marker='o');\n",
     "plt.xlabel(\"$x^2$\")\n",
     "plt.ylabel(\"$y^2$\");"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "As you can see both sets can be separated by a line now!"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Another example for feature engineering"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "The so called \"xor-problem\" is a typical benchmark problem for machine learning. The following example illustrates this problem:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "xor = pd.read_csv(\"data/xor.csv\")\n",
     "xor.head()"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "xv = xor[\"x\"]\n",
     "yv = xor[\"y\"]\n",
@@ -679,12 +682,11 @@
     "plt.plot([-2, 2], [0, 0], \"k:\")\n",
     "plt.title(\"Blue points are labeled False\")\n",
     "plt.scatter(xv, yv, color=colors, marker=\"o\");"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Again, this example data set can not be separated by a line. But we see that points where the sign of x and y are the same appear to form one class, and point with different signs for x and y belong to the other class.\n",
     "\n",
@@ -693,12 +695,13 @@
     "Here we can use the fact that the product of two numbers is postive if and only if both numbers have the same sign.\n",
     "\n",
     "So lets plot a histogram over `x * y`:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "products = xor[\"x\"] * xor[\"y\"]\n",
     "\n",
@@ -707,19 +710,18 @@
     "\n",
     "plt.hist(features_class_false,  bins=30, color=\"steelblue\", alpha=.5, histtype=\"stepfilled\")\n",
     "plt.hist(features_class_true,  bins=30, color=\"chocolate\", alpha=.5, histtype=\"stepfilled\");"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Having such feature a simple classifier could just introduce a threshold of 0 to distinguish both classes."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Feature engineering HOWTO\n",
     "\n",
@@ -745,11 +747,11 @@
     "\n",
     "\n",
     "- sales data can be enhanced from a date feature by an extra feature \"is weekday\"."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Most cases have higher dimensions than 2 or 3 and visual inspection can be difficult. Thus, engineering features as we did in the 2D examples becomes tricky.\n",
     "\n",
@@ -767,20 +769,20 @@
     "<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i>&nbsp;\n",
     "Adding too many features (especially redundant features) can introduce other problems, such as, for instance, <strong>overfitting</strong> (we'll learn later about that). There are methods for selection of a subset of \"good-enough\" features (cf. <a href=\"https://scikit-learn.org/stable/modules/feature_selection.html\"><code>scikit-learn</code> feature selection module</a>).\n",
     "</p></div>"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Engineer polynomial features using `scikit-learn`\n",
     "\n",
     "*Polynomial features* are a way to (semi-)automatically engineere new non-linear features. These are all polynomial combinations of the features (up to given degree)."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "In <code>scikit-learn</code> polynomial feature engineering is part of the `sklearn.preprocessing` module containing utilities for features preprocessing.\n",
     "\n",
@@ -797,19 +799,20 @@
     "    <li>a <strong><code>fit()</code></strong> and <strong><code>fit_transform()</code></strong> methods to learn the preprocessing from data or fit and transform in one step.</li>\n",
     "</ul>\n",
     "</div>"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "For instance, the \"sign\" feature `x * y` in the XOR dataset in the previous example is a polynomial feature of rank 2 (1+1). Let's see how to generate it among with other polynomial features up to rank 2 using `scikit-lern`:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "import pandas as pd\n",
     "from sklearn.preprocessing import PolynomialFeatures\n",
@@ -818,23 +821,22 @@
     "df = pd.read_csv(\"data/xor.csv\")\n",
     "features = df.iloc[:, :-1]\n",
     "features.head()"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "preproc = PolynomialFeatures(degree=2, include_bias=False)\n",
     "data = preproc.fit_transform(features)\n",
     "pd.DataFrame(data).head()"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "In this case \n",
     "- columns `0` and `1` are $x$ and $y$ from the original data set.\n",
@@ -843,26 +845,27 @@
     "- column `4` is $y^2$.\n",
     "\n",
     "Setting `include_bias=False` omits the degree 0 polynomial, which is a constant column with value `1`. For a complete description see [docs for `sklearn.preprocessing.PolynomialFeatures`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Exercise section 2"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "The following functions plot a 2D dataset points and a decision surface of classifier trained beforehand on that dataset."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "import numpy as np\n",
     "import matplotlib.pyplot as plt\n",
@@ -918,20 +921,22 @@
     "\n",
     "    plot_points(features_2d, labels)\n",
     "    plt.title(name)"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
+    "Note: `scikit-learn 1.0` introduced a class `sklearn.inspection.DecisionBoundaryDisplay` (see also [here](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.DecisionBoundaryDisplay.html)) to plot decision surfaces.\n",
+    "\n",
     "Let's use them to plot a decision surface of a logistic regression classifier trained on a XOR dataset:"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "from sklearn.linear_model import LogisticRegression\n",
     "from sklearn.preprocessing import PolynomialFeatures\n",
@@ -947,22 +952,26 @@
     "# preproc = PolynomialFeatures(2, include_bias=False)\n",
     "\n",
     "train_and_plot_decision_surface(\"Logistic regression\", clf, features, labels, preproc=None)"
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Logistic regression with polynomial features\n",
     "\n",
     "Train and plot decision surface for logistic regression classifier of XOR dataset with polynomial features engineered. What's the result and why?"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {
+    "tags": [
+     "solution"
+    ]
+   },
+   "outputs": [],
    "source": [
     "from sklearn.linear_model import LogisticRegression\n",
     "from sklearn.preprocessing import PolynomialFeatures\n",
@@ -977,29 +986,28 @@
     "\n",
     "preproc = PolynomialFeatures(2, include_bias=False)\n",
     "train_and_plot_decision_surface(\"Logistic regression\", clf, features, labels, preproc=preproc)"
-   ],
-   "outputs": [],
-   "metadata": {
-    "scrolled": true,
-    "tags": [
-     "solution"
-    ]
-   }
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Tuning classifiers in a difficult problem\n",
     "\n",
     "Load the data from `\"data/spiral.csv\"`, plot points and train both Logistic Regression classfier with polynomial features and the Support Vector Classifier `sklearn.svm.SVC` with no preprocessing. Compare the decision surfaces.\n",
     "\n",
     "Try different values of degree of the polynomial features and of the hyperparameter `C` (applicable to both classifiers).\n"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {
+    "tags": [
+     "solution"
+    ]
+   },
+   "outputs": [],
    "source": [
     "df = pd.read_csv(\"data/spiral.csv\")\n",
     "\n",
@@ -1016,7 +1024,7 @@
     "from sklearn.linear_model import LogisticRegression\n",
     "from sklearn.preprocessing import PolynomialFeatures\n",
     "\n",
-    "clf = LogisticRegression(C=1)\n",
+    "clf = LogisticRegression(C=1, max_iter=1000)\n",
     "preproc = PolynomialFeatures(4, include_bias=False)\n",
     "\n",
     "plt.figure(figsize=(6, 6))\n",
@@ -1029,26 +1037,22 @@
     "\n",
     "plt.figure(figsize=(6, 6))\n",
     "train_and_plot_decision_surface(\"SVC\", clf, features, labels, preproc=None)"
-   ],
-   "outputs": [],
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   }
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Comparison of decision surfaces for different classifiers and datasets\n",
     "\n",
     "Compare decision surfaces for different classifiers listed below for both `\"data/xor.csv\"` and `\"data/circle.csv\"` (circle) datasets. For which classifiers does it help to add polynomial features? How many degrees suffice?"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "from sklearn.linear_model import LogisticRegression\n",
     "from sklearn.svm import LinearSVC, SVC\n",
@@ -1056,13 +1060,17 @@
     "from sklearn.neighbors import KNeighborsClassifier\n",
     "\n",
     "# ...."
-   ],
-   "outputs": [],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "metadata": {
+    "tags": [
+     "solution"
+    ]
+   },
+   "outputs": [],
    "source": [
     "from sklearn.linear_model import LogisticRegression\n",
     "from sklearn.svm import LinearSVC, SVC\n",
@@ -1116,30 +1124,24 @@
     "\n",
     "\n",
     "try_dataset(\"data/xor.csv\", PolynomialFeatures(2, include_bias=False))\n"
-   ],
-   "outputs": [],
-   "metadata": {
-    "scrolled": true,
-    "tags": [
-     "solution"
-    ]
-   }
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "source": [
-    "try_dataset(\"data/circle.csv\", PolynomialFeatures(4, include_bias=False))"
-   ],
-   "outputs": [],
    "metadata": {
     "tags": [
      "solution"
     ]
-   }
+   },
+   "outputs": [],
+   "source": [
+    "try_dataset(\"data/circle.csv\", PolynomialFeatures(4, include_bias=False))"
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## But.. what if there are more than two classes?\n",
     "\n",
@@ -1175,22 +1177,35 @@
     "<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i>&nbsp;\n",
     "    In <code>scikit-learn</code> many classifiers support multi-class problems out of the box and also offer functionalities to implement <strong>one-vs-rest</strong> or <strong>one-vs-one</strong> in some cases (cf. <a href=\"https://scikit-learn.org/stable/modules/multiclass.html\"><code>scikit-learn</code> multiclass and multilabel algorithms</a>).\n",
     "</p></div>"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Copyright (C) 2019-2021 ETH Zurich, SIS ID"
-   ],
-   "metadata": {}
+   ]
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.5"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
-}
\ No newline at end of file
+ "nbformat_minor": 4
+}