Newer
Older
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Kameron');\n",
" @import url('http://fonts.googleapis.com/css?family=Crimson+Text');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Lato');\n",
" @import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');\n",
" \n",
" @import url('http://fonts.googleapis.com/css?family=Lora'); \n",
"\n",
" \n",
" body {\n",
" font-family: 'Lora', Consolas, sans-serif;\n",
" \n",
" -webkit-print-color-adjust: exact important !;\n",
" \n",
" \n",
" \n",
" }\n",
" \n",
" .alert-block {\n",
" width: 95%;\n",
" margin: auto;\n",
" }\n",
" \n",
" .rendered_html code\n",
" {\n",
" color: black;\n",
" background: #eaf0ff;\n",
" background: #f5f5f5; \n",
" padding: 1pt;\n",
" font-family: 'Source Code Pro', Consolas, monocco, monospace;\n",
" }\n",
" \n",
" p {\n",
" line-height: 140%;\n",
" }\n",
" \n",
" strong code {\n",
" background: red;\n",
" }\n",
" \n",
" .rendered_html strong code\n",
" {\n",
" background: #f5f5f5;\n",
" }\n",
" \n",
" .CodeMirror pre {\n",
" font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;\n",
" }\n",
" \n",
" .cm-s-ipython span.cm-keyword {\n",
" font-weight: normal;\n",
" }\n",
" \n",
" strong {\n",
" background: #f5f5f5;\n",
" margin-top: 4pt;\n",
" margin-bottom: 4pt;\n",
" padding: 2pt;\n",
" border: 0.5px solid #a0a0a0;\n",
" font-weight: bold;\n",
" color: darkred;\n",
" }\n",
" \n",
" \n",
" div #notebook {\n",
" # font-size: 10pt; \n",
" line-height: 145%;\n",
" }\n",
" \n",
" li {\n",
" line-height: 145%;\n",
" }\n",
"\n",
" div.output_area pre {\n",
" background: #fff9d8 !important;\n",
" padding: 5pt;\n",
" \n",
" -webkit-print-color-adjust: exact; \n",
" \n",
" }\n",
" \n",
" \n",
" \n",
" h1, h2, h3, h4 {\n",
" font-family: Kameron, arial;\n",
"\n",
"\n",
" }\n",
" \n",
" div#maintoolbar {display: none !important;}\n",
"</style>\n",
" <script>\n",
"IPython.OutputArea.prototype._should_scroll = function(lines) {\n",
" return false;\n",
"}\n",
" </script>\n"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"import warnings\n",
"warnings.filterwarnings('ignore', category=FutureWarning)\n",
"from IPython.core.display import HTML; HTML(open(\"custom.html\", \"r\").read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 1: General Introduction to machine learning (ML)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A \"model\" allows us to explain observations and to answer questions. For example:\n",
"\n",
" 1. Where will my car at given velocity stop if I apply break now?\n",
" 2. Where on the night sky will I see the moon tonight?\n",
" 3. Is the email I received spam?\n",
"- The first two questions can be answered based on existing physical models (formulas). \n",
"\n",
"- For the questions 3 and 4 it is difficult to develop explicitly formulated models. \n",
"- We have a vague understanding of the problem domain, e.g. we know that some words are specific to spam emails and others are specific to my personal and work-related emails.\n",
"- We have enough example data, as my mailbox is full of both spam and non-spam emails.\n",
"\n",
"\n",
"We could handcraft a personal spam classifier by hard coding rules, like _\"mail contains 'no prescription' and comes from russia or china\"_, plus some statistics. This would be very tedious.\n",
"\n",
"<div class=\"alert alert-block alert-info\">\n",
"<i class=\"fa fa-info-circle\"></i>\n",
" Systems with such hard coded rules are called <strong>expert systems</strong>\n",
"</div>\n",
"\n",
"In such cases machine learning is a better approach.\n",
"<div class=\"alert alert-block alert-warning\">\n",
"<i class=\"fa fa-info-circle\"></i>\n",
"<strong>Machine learning</strong> offers approaches to automatically build predictive models based on example data.\n",
"</div>\n",
"<div class=\"alert alert-block alert-info\">\n",
"<i class=\"fa fa-info-circle\"></i>\n",
"The closely-related concept of <strong>data mining</strong> usually means use of predictive machine learning models to explicitly discover previously unknown knowledge from a specific data set, such as, for instance, association rules between customer and article types in the Problem 4 above.\n",
"\n",
"\n",
"\n",
"## ML: what is \"learning\" ?\n",
"\n",
"To create a predictive model, we must first **train** such a model on given data. \n",
"<div class=\"alert alert-block alert-info\">\n",
"<i class=\"fa fa-info-circle\"></i>\n",
"Alternative names for \"to train\" a model are \"to <strong>fit</strong>\" or \"to <strong>learn</strong>\" a model.\n",
"</div>\n",
"All ML algorithms have in common that they rely on internal data structures and/or parameters.\n",
Loading
Loading full blame...