Newer
Older
{
"name": "stdout",
"output_type": "stream",
"text": [
"225 examples\n",
"187 labeled correctly\n"
]
}
],
"source": [
"print(len(labels), \"examples\")\n",
"print(sum(predicted_labels == labels), \"labeled correctly\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<i class=\"fa fa-info-circle\"></i>\n",
"<code>predicted_labels == labels</code> evaluates to a vector of <code>True</code> or <code>False</code> Boolean values. When used as numbers, Python handles <code>True</code> as <code>1</code> and <code>False</code> as <code>0</code>. So, <code>sum(...)</code> simply counts the correctly predicted labels.\n",
"</div>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What happened ?\n",
"\n",
"Why were not not all labels where predicted correctly ?\n",
"\n",
"Neither `Python` nor `scikit-learn` is broken. What we observed above is very typical for machine-learning applications.\n",
"\n",
"The reason here is that we have incomplete information: other features of beer which also contribute to the rating (like \"maltiness\") where not measured or can not be measured. So even the best algorithm can not predict the target values reliably.\n",
"\n",
"Another reason might be mistakes in the input data, e.g. some labels are assigned incorrectly.\n",
"* Finding good features is crucial for the performance of ML algorithms !\n",
"\n",
"\n",
"* Another important issue is make sure that you have clean data: input-features might be corrupted by flawed entries, feeding such data into a ML algorithm will usually lead to reduced performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-danger\">\n",
"<strong>TODO:</strong> I propose to start excercise session 2 here, and ask to do re-classification with SVM, and only then play w/ regularization param.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, train a different `scikit-learn` classifier - the so called **Support Vector Classifier** `SVC`, and evaluate its \"re-classification\" performance again.\n",
"\n",
"<div class=\"alert alert-block alert-info\">\n",
"<i class=\"fa fa-info-circle\"></i>\n",
"<code>SVC</code> belongs to a class of algorithms named \"Support Vector Machines\" (SVMs). We will discuss available ML algorithms in more detail in the following scripts.\n",
"</div>"
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"225 examples\n",
"205 labeled correctly\n"
"from sklearn.svm import SVC\n",
"# ...\n",
"# REMOVE the following lines in the target script\n",
"classifier = SVC()\n",
"classifier.fit(input_features, labels)\n",
"\n",
"predicted_labels = classifier.predict(input_features)\n",
"\n",
"assert(predicted_labels.shape == labels.shape)\n",
"print(len(labels), \"examples\")\n",
"print(sum(predicted_labels == labels), \"labeled correctly\")"
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<i class=\"fa fa-info-circle\"></i>\n",
"Better re-classification does not indicate here that <code>SVC</code> is better than <code>LogisticRegression</code>. At most it seems to fit better to our training data. We will learn later that this may actually not necessarily be a good thing.\n",
"</div>\n",
"Note that both `LogisticRegression` and `SVC` classifiers have a parameter `C` which allows to enforce simplification (know also as regularization) of the resulting model. Test the beers data \"re-classification\" with different values of this parameter."
"outputs": [],
"source": [
"?LogisticRegression\n",
"# ..."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/mikolajr/Workspace/SSDM/machinelearning-introduction-workshop/.venv/lib/python3.7/site-packages/ipykernel_launcher.py:9: UserWarning: get_ipython_dir has moved to the IPython.paths module since IPython 4.0.\n",
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
" if __name__ == '__main__':\n"
]
},
{
"data": {
"text/html": [
"<style>\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
" @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Lato');\n",
" @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
"\n",
" \n",
" body {\n",
" font-family: 'Lora', Consolas, sans-serif;\n",
" \n",
" -webkit-print-color-adjust: exact important !;\n",
" \n",
" }\n",
" .rendered_html code\n",
" {\n",
" color: black;\n",
" background: #eaf0ff;\n",
" padding: 1pt;\n",
" font-family: 'Source Code Pro', Consolas, monocco, monospace;\n",
" }\n",
" \n",
" p {\n",
" line-height: 140%;\n",
" }\n",
" \n",
" strong code {\n",
" background: red;\n",
" }\n",
" \n",
" .rendered_html strong code\n",
" {\n",
" background: #f5f5f5;\n",
" }\n",
" \n",
" .CodeMirror pre {\n",
" font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
" }\n",
" \n",
" .cm-s-ipython span.cm-keyword {\n",
" font-weight: normal;\n",
" }\n",
" \n",
" strong {\n",
" background: #f5f5f5;\n",
" margin-top: 4pt;\n",
" margin-bottom: 4pt;\n",
" padding: 2pt;\n",
" border: 0.5px solid #a0a0a0;\n",
" font-weight: bold;\n",
" color: darkred;\n",
" }\n",
" \n",
" \n",
" div #notebook {\n",
" # font-size: 10pt; \n",
" line-height: 145%;\n",
" }\n",
" \n",
" li {\n",
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
" }\n",
"\n",
" div.output_area pre {\n",
" background: #fff9d8 !important;\n",
" padding: 5pt;\n",
" \n",
" -webkit-print-color-adjust: exact; \n",
" \n",
" }\n",
" \n",
" \n",
" \n",
" h1, h2, h3, h4 {\n",
" font-family: Kameron, arial;\n",
" }\n",
" \n",
" div#maintoolbar {display: none !important;}\n",
" </style>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#REMOVEBEGIN\n",
"# THE LINES BELOW ARE JUST FOR STYLING THE CONTENT ABOVE !\n",
"\n",
"from IPython import utils\n",
"from IPython.core.display import HTML\n",
"import os\n",
"def css_styling():\n",
" \"\"\"Load default custom.css file from ipython profile\"\"\"\n",
" base = utils.path.get_ipython_dir()\n",
" styles = \"\"\"<style>\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
" @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Lato');\n",
" @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
"\n",
" \n",
" body {\n",
" font-family: 'Lora', Consolas, sans-serif;\n",
" \n",
" -webkit-print-color-adjust: exact important !;\n",
" \n",
" }\n",
" .rendered_html code\n",
" {\n",
" color: black;\n",
" background: #eaf0ff;\n",
" padding: 1pt;\n",
" font-family: 'Source Code Pro', Consolas, monocco, monospace;\n",
" }\n",
" \n",
" p {\n",
" line-height: 140%;\n",
" }\n",
" \n",
" strong code {\n",
" background: red;\n",
" }\n",
" \n",
" .rendered_html strong code\n",
" {\n",
" background: #f5f5f5;\n",
" }\n",
" \n",
" .CodeMirror pre {\n",
" font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
" }\n",
" \n",
" .cm-s-ipython span.cm-keyword {\n",
" font-weight: normal;\n",
" }\n",
" \n",
" strong {\n",
" background: #f5f5f5;\n",
" margin-top: 4pt;\n",
" margin-bottom: 4pt;\n",
" padding: 2pt;\n",
" border: 0.5px solid #a0a0a0;\n",
" font-weight: bold;\n",
" color: darkred;\n",
" }\n",
" \n",
" \n",
" div #notebook {\n",
" # font-size: 10pt; \n",
" line-height: 145%;\n",
" }\n",
" \n",
" li {\n",
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
" }\n",
"\n",
" div.output_area pre {\n",
" background: #fff9d8 !important;\n",
" padding: 5pt;\n",
" \n",
" -webkit-print-color-adjust: exact; \n",
" \n",
" }\n",
" \n",
" \n",
" \n",
" h1, h2, h3, h4 {\n",
" font-family: Kameron, arial;\n",
" }\n",
" \n",
" div#maintoolbar {display: none !important;}\n",
" </style>\"\"\"\n",
" return HTML(styles)\n",
"css_styling()\n",
"#REMOVEEND"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",