{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is machine learning ?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Discipline in the overlap of computer science and statistics\n",
    "- Subset of Artificla Intelligence\n",
    "- Learn models from data\n",
    "- Term \"Machine Learning\" was first used in 1959 by AI pioneer Arthur Samuel\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is \"learn from data\" ?\n",
    "\n",
    "- Model examples: \n",
    "\n",
    "   - Is the email I receied spam ? \n",
    "   - Does an image show a cat ? \n",
    "   - What can I recommend my customers ?\n",
    "   - How will the stock market look like tomorrow ?\n",
    "   \n",
    "Learn from data: \n",
    "\n",
    "- No exact model known or implementable\n",
    "- example data should contain sufficient information to build (approximated) models from this.\n",
    "- Requires data with sufficient \"encoded information\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some history\n",
    "\n",
    "Rough Time Line\n",
    "\n",
    " \n",
    "    1812: Bayes Theorem\n",
    "    1913: Markov Chains\n",
    "    1951: First neural network\n",
    "    1969: Book \"Perceptrons\": Limitations of Neural Networks\n",
    "    1986: Backpropagation to learn neural networks\n",
    "    1995: Randomized Forests and Support Vector Machines\n",
    "    1998: Naive Bayes Classifier for Spam detection\n",
    "    2000+: Deep learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Features\n",
    "\n",
    "(Almost) all machine learning algorithms require that your data is numerical.\n",
    "\n",
    "A collection of such data is organized as a feature matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "features = pd.read_csv(\"beers.csv\")\n",
    "features.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- columns are called a **features**\n",
    "- rows are called a **sampled** or **feature vectors**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Other examples: images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_digits\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dd = load_digits()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 9\n",
    "plt.figure(figsize=(2 * N, 5))\n",
    "for i, image in enumerate(dd.images[:N], 1):\n",
    "    plt.subplot(1, N, i)\n",
    "    plt.imshow(image, cmap=\"gray\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(dd.images[0])\n",
    "print(dd.images[0].shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here creating a feature vector is just \"flattening\" the matrix by concatenating the rows to one long vector:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(dd.images[0].flatten())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Other examples: text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0 1 2 0 1 1]\n"
     ]
    }
   ],
   "source": [
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "from itertools import count\n",
    "\n",
    "# map words to index in created vector:\n",
    "vocabulary = [\"like\", \"dislike\", \"american\", \"italian\", \"beer\", \"pizza\"]\n",
    "\n",
    "vectorizer = CountVectorizer(vocabulary=dict(zip(vocabulary, count())))\n",
    "\n",
    "# crate count vector for a pice of text:\n",
    "vector = vectorizer.fit_transform([\"I dislike american pizza. But american beer is nice\"]).toarray()[0]\n",
    "print(vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Machine learning taxonomy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}