diff --git a/02_classification.ipynb b/02_classification.ipynb index 62cbbd3cb1276c3b510ce76ad6ccfcf3dc9a936b..cf21a234e47b2b2f81bfee649a526bdd2c40fd66 100644 --- a/02_classification.ipynb +++ b/02_classification.ipynb @@ -931,7 +931,45 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Feature engineering\n", + "## But.. what if there are more than two classes?\n", + "\n", + "\n", + "Previous and the following examples in this script consider two class problems.\n", + "Before we dig deeper into classification, let's say a few words on how to handle more than two classes.\n", + "\n", + "\n", + "<div class=\"alert alert-block alert-warning\"><p><i class=\"fa fa-warning\"></i> \n", + " The general idea for <code>n > 2</code> classes is to build multiple 2-class classifiers and determine a winning class by applying all of them:\n", + "<ul>\n", + " <li>in the <strong>one-vs-all</strong> approach build <code>n</code> classifiers for \"label n vs. the rest\";</li>\n", + " <li>in the <strong>one-vs-one</strong> approach builds classifiers for `label i vs label j` (in total <code>n x (n - 1) / 2</code> classifiers).</li>\n", + "</ul>\n", + "</p></div>\n", + "\n", + "For new incoming data then all classifiers (`n` or `n x (n -1) / 2`) are applied and the overall winning class gives the final result.\n", + "\n", + "For instance, to classify images of digits:\n", + "\n", + "- we could build 10 classifiers `is it 0 or other digit`, `is it 1 or other digit`, etc.\n", + " \n", + " A new image then would hopefully yield `True` for exactly one of the classifier, in other situations the result is unclear.\n", + " \n", + " \n", + "- we could build 45 classifiers `is it 0 or 1`, `is it 0 or 2`, etc.\n", + "\n", + " For a new image we could choose the final outcome based on which class \"wins\" most often.\n", + "\n", + "\n", + "<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i> \n", + " In <code>scikit-learn</code> many classifiers support multi-class problems out of the box and also offer functionalities to implement <strong>one-vs-all</strong> or <strong>one-vs-one</strong> in some cases (cf. <a href=\"https://scikit-learn.org/stable/modules/multiclass.html\"><code>scikit-learn</code> multiclass and multilabel algorithms</a>).\n", + "</p></div>" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Feature engineering\n", "\n", "To improve ML performance we can try to create new feature by transformation of existing features. This process is called **feature engineering**." ] @@ -940,7 +978,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### An example for feature engineering" + "### An example for feature engineering" ] }, { @@ -990,7 +1028,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Another example for feature engineering" + "### Another example for feature engineering" ] }, { @@ -1220,7 +1258,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Engineer polynomial features using `scikit-learn`\n", + "### Engineer polynomial features using `scikit-learn`\n", "\n", "*Polynomial features* are a way to (semi-)automatically engineere new non-linear features. These are all polynomial combinations of the features (up to given degree)." ] @@ -1819,44 +1857,6 @@ " print()\n" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Side note: what if there are more than two classes?\n", - "\n", - "\n", - "Previous and the following examples in this script consider two class problems.\n", - "Before we dig deeper into classification, let's say a few words on how to handle more than two classes.\n", - "\n", - "\n", - "<div class=\"alert alert-block alert-warning\"><p><i class=\"fa fa-warning\"></i> \n", - " The general idea for <code>n > 2</code> classes is to build multiple 2-class classifiers and determine a winning class by applying all of them:\n", - "<ul>\n", - " <li>in the <strong>one-vs-all</strong> approach build <code>n</code> classifiers for \"label n vs. the rest\";</li>\n", - " <li>in the <strong>one-vs-one</strong> approach builds classifiers for `label i vs label j` (in total <code>n x (n - 1) / 2</code> classifiers).</li>\n", - "</ul>\n", - "</p></div>\n", - "\n", - "For new incoming data then all classifiers (`n` or `n x (n -1) / 2`) are applied and the overall winning class gives the final result.\n", - "\n", - "For instance, to classify images of digits:\n", - "\n", - "- we could build 10 classifiers `is it 0 or other digit`, `is it 1 or other digit`, etc.\n", - " \n", - " A new image then would hopefully yield `True` for exactly one of the classifier, in other situations the result is unclear.\n", - " \n", - " \n", - "- we could build 45 classifiers `is it 0 or 1`, `is it 0 or 2`, etc.\n", - "\n", - " For a new image we could choose the final outcome based on which class \"wins\" most often.\n", - "\n", - "\n", - "<div class=\"alert alert-block alert-info\"><p><i class=\"fa fa-info-circle\"></i> \n", - " In <code>scikit-learn</code> many classifiers support multi-class problems out of the box and also offer functionalities to implement <strong>one-vs-all</strong> or <strong>one-vs-one</strong> in some cases (cf. <a href=\"https://scikit-learn.org/stable/modules/multiclass.html\"><code>scikit-learn</code> multiclass and multilabel algorithms</a>).\n", - "</p></div>" - ] - }, { "cell_type": "code", "execution_count": 23,