From 1b74fff4748dfd13fabb208788254761f09c36de Mon Sep 17 00:00:00 2001 From: Mikolaj Rybinski <mikolaj.rybinski@id.ethz.ch> Date: Fri, 17 Sep 2021 17:03:26 +0200 Subject: [PATCH] In Notebook 04, in exercise block 2: fix reference and decimal places typo --- 04_measuring_quality_of_a_classifier.ipynb | 203 +++++++++++---------- 1 file changed, 103 insertions(+), 100 deletions(-) diff --git a/04_measuring_quality_of_a_classifier.ipynb b/04_measuring_quality_of_a_classifier.ipynb index 8f8e887..c4fcf73 100644 --- a/04_measuring_quality_of_a_classifier.ipynb +++ b/04_measuring_quality_of_a_classifier.ipynb @@ -3,6 +3,8 @@ { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n", "import matplotlib.pyplot as plt\n", @@ -12,57 +14,56 @@ "warnings.filterwarnings('ignore', category=FutureWarning)\n", "warnings.filterwarnings = lambda *a, **kw: None\n", "from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Chapter 4: Metrics for evaluating the performance of a classifier" - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import sklearn.metrics as metrics\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Up to now we used *accuracy*, the percentage of correct classifcations, to evaluate the quality of a classifier.\n", "\n", "Regrettably _accuracy_ can produce very misleading results. \n", "\n", "This chapter will discuss other metrics used to asses the quality of a classifier, including the possible pitfalls." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## The confusion matrix" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Before we define the **confusion matrix** we must introduce some additional terms. \n" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "After applying a classifier to a data set with known labels `0` and `1`:\n", "\n", @@ -120,11 +121,11 @@ "\n", "<img src=\"./images/305c8j.jpg\" title=\"made at imgflip.com\" width=40%/>\n", "\n" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "\n", "\n", @@ -148,11 +149,11 @@ "\n", "</div>\n", "\n" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Pitfalls\n", "\n", @@ -190,22 +191,26 @@ "2. Does our test predict people as infected which are actually not: How many positive diagnoses are correct ?\n", "\n", "We come back to this example later." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Exercise block 1\n", "\n", "1. A classifier predicts labels `[0, 1, 0, 1, 1, 0, 1, 0]` whereas true labels are `[0, 0, 1, 1, 1, 0, 1, 1]`. First write these values as a two columned table using pen & paper and assign `FP`, `TP`, ... to each row. Now create the confusion matrix and compute accuracy.\n", "\n", "2. A random classfier just assign a randomly chosen label `0` or `1` to a given sample. What is the average accuracy of such a classifier?" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": { + "tags": [ + "solution" + ] + }, "source": [ "SOLUTION 1.1 \n", "<pre>\n", @@ -228,27 +233,27 @@ "SOLUTION 1.2 \n", "\n", "On average all fields of the confusion matrix should contain same values, thus the accuracy would be 50 %." - ], - "metadata": { - "tags": [ - "solution" - ] - } + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Optional exercise\n", "\n", "Assume the previously described test also produces wrong results on not-infected people, such that 5% will be diagnosed as infected. Compute the confusion matrix and the accuracy of this test.\n", "\n" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": { + "tags": [ + "solution" + ] + }, "source": [ - "\n", + "SOLUTION Optional exercise\n", "\n", "This is the new situation:\n", "- On average 10 out of 10000 people are infected with a disease `X`. \n", @@ -264,15 +269,11 @@ "\n", "accuracy = 9495.5 / 10000 = 94.96 %\n", "</pre>" - ], - "metadata": { - "tags": [ - "solution" - ] - } + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Precision and Recall\n", "\n", @@ -291,11 +292,11 @@ "<img src=\"./images/precision-recall-1.png\" width=90% />\n", "\n", "\n" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### How to compute precision and recall for a classifier\n", "\n", @@ -382,21 +383,26 @@ "</div>\n", "\n", "For the medical test `Z` the `F1` score is `1 / 1.5 = 0.6666..`." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Exercise block 2\n", "\n", - "Use your results from exercise 1.1 to compute precision, recall and F1 score." - ], - "metadata": {} + "Use your results from Exercise block 1.1 to compute precision, recall and F1 score." + ] }, { "cell_type": "markdown", + "metadata": { + "tags": [ + "solution" + ] + }, "source": [ + "SOLUTION 2\n", "<pre>\n", "TP = 3 FP = 1\n", "FN = 2 TN = 2\n", @@ -405,68 +411,67 @@ "recall = 3 / (3 + 2) = 60 %\n", "F1 = 2 * (0.6 * 0.75) / (0.6 + 0.75) = 66.66%\n", "</pre>" - ], - "metadata": { - "tags": [ - "solution" - ] - } + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Optional exercise:\n", "\n", - "Compute precision, recall and F1-score for the test described in exercise 1.2." - ], - "metadata": {} + "Compute precision, recall and F1-score for the test described in Exercise block 1 Optional exercise." + ] }, { "cell_type": "markdown", + "metadata": { + "tags": [ + "solution" + ] + }, "source": [ + "SOLUTION 2 Optional exercise\n", + "\n", "<pre>\n", "TP = 5 FP = 499.5\n", "FN = 5 TN = 9490.5\n", "\n", "precision = 5 / (5 + 499.5) = 0.0099\n", "recall = 5 / (5 + 5) = 0.5\n", - "F1 = 2 * (0.099 * 0.5) / (0.0099 + 0.5) = 0.194\n", + "F1 = 2 * (0.0099 * 0.5) / (0.0099 + 0.5) = 0.0194\n", "</pre>" - ], - "metadata": { - "tags": [ - "solution" - ] - } + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Other metrics\n", "\n", "The discussion above was just a quick introduction to measuring the accuracy of a classifier. We skipped other metrics such as `ROC` and `AUC` amongst others.\n", "\n", "A good introduction to `ROC` <a href=\"https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/\">can be found here.</a>" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Metrics in scikit-learn" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "`sklearn.metrics` contains all introduced above metrics, as well as the previously-used classification accuracy:" - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score\n", "\n", @@ -483,60 +488,60 @@ "print(\"{:20s} {:.3f}\".format(\"recall\", recall_score(labels, predicted)))\n", "print(\"{:20s} {:.3f}\".format(\"f1\", f1_score(labels, predicted)))\n", "print(\"{:20s} {:.3f}\".format(\"accuracy\", accuracy_score(labels, predicted)))\n" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Classification report\n", "\n", "`scikit-learn` also offers a function to print a classification report, which is an overview table of precision, recall and F1 metrics:" - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from sklearn.metrics import classification_report\n", "\n", "print(classification_report(labels, predicted, ))" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "1. The `support` column lists the number of samples in each class and in total.\n", "2. The `macro average` row lists unweighted mean of a metric for each label. This does NOT take classes imbalance into account.\n", "3. The `weighted average` row lists weighted by support mean of a metric for each label. This does take classes imbalance into account.\n", "\n", "Note: normally the precision, recall and F1 metrics are only the \"Positive\" (`1`) class metrics (cf. results above)." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Confusion matrix" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "The `sklearn.metrics` module contains also `confusion_matrix` utility which returns the confusion matrix.\n", "\n", "Beware: the matrix is transposed with respect to the conventional notation; actual (true) classes are given in rows, whereas predicted in columns." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix\n", "\n", @@ -553,20 +558,20 @@ "print()\n", "\n", "#\n" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Having a classifier object, the confusion matrix can also be visualized using a `plot_confusion_matrix` utility function." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import plot_confusion_matrix\n", @@ -598,12 +603,11 @@ "\n", "cm_disp.ax_.set_title('Confusion matrix: \"beer\" dataset + LR classfier')\n", "plt.show()" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "<div class=\"alert alert-block alert-info\">\n", "<p>\n", @@ -614,11 +618,11 @@ "<img src=\"./images/confusion_matrix-iris_svc.png\" width=\"50%\" />\n", "\n", "</div>" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Case-study: an imbalanced dataset\n", "\n", @@ -628,12 +632,13 @@ "\n", "- the beer data samples in which labels distribution is almost 50:50, and\n", "- an unbalanced subset of the beer data samples." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import pandas as pd\n", "\n", @@ -651,13 +656,13 @@ "print(\"unbalanced data\")\n", "print(beer_data_unbalanced.shape)\n", "print(\"#class 1:\", sum(beer_data_unbalanced.iloc[:,-1] == 1))" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from sklearn.model_selection import cross_val_score\n", "from sklearn.linear_model import LogisticRegression\n", @@ -688,16 +693,14 @@ "\n", "print(\"unbalanced data\")\n", "assess(classifier, beer_data_unbalanced)" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "You can see that for the balanced data set the values for `f1` and for `accuracy` are almost equal, but differ significantly for the unbalanced data set. The `f1` metric captures the `precision` and `recall` trade off which is visible for imbalanced datasets." - ], - "metadata": {} + ] }, { "cell_type": "markdown", -- GitLab