Skip to content
Snippets Groups Projects
01_introduction.ipynb 5.47 KiB
Newer Older
  • Learn to ignore specific revisions
  • schmittu's avatar
    schmittu committed
    {
     "cells": [
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": []
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Introduction"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## What is machine learning ?"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "- Discipline in the overlap of computer science and statistics\n",
        "- Subset of Artificla Intelligence\n",
        "- Learn models from data\n",
        "- Term \"Machine Learning\" was first used in 1959 by AI pioneer Arthur Samuel\n",
        " "
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## What is \"learn from data\" ?\n",
        "\n",
        "- Model examples: \n",
        "\n",
        "   - Is the email I receied spam ? \n",
        "   - Does an image show a cat ? \n",
        "   - What can I recommend my customers ?\n",
        "   - How will the stock market look like tomorrow ?\n",
        "   \n",
        "Learn from data: \n",
        "\n",
        "- No exact model known or implementable\n",
        "- example data should contain sufficient information to build (approximated) models from this.\n",
        "- Requires data with sufficient \"encoded information\""
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Some history\n",
        "\n",
        "Rough Time Line\n",
        "\n",
        " \n",
        "    1812: Bayes Theorem\n",
        "    1913: Markov Chains\n",
        "    1951: First neural network\n",
        "    1969: Book \"Perceptrons\": Limitations of Neural Networks\n",
        "    1986: Backpropagation to learn neural networks\n",
        "    1995: Randomized Forests and Support Vector Machines\n",
        "    1998: Naive Bayes Classifier for Spam detection\n",
        "    2000+: Deep learning"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Features\n",
        "\n",
        "(Almost) all machine learning algorithms require that your data is numerical.\n",
        "\n",
        "A collection of such data is organized as a feature matrix:"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "import pandas as pd\n",
        "\n",
        "features = pd.read_csv(\"beers.csv\")\n",
        "features.head()"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "- columns are called a **features**\n",
        "- rows are called a **sampled** or **feature vectors**."
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Other examples: images"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "from sklearn.datasets import load_digits\n",
        "import matplotlib.pyplot as plt\n",
        "%matplotlib inline"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "dd = load_digits()\n"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "N = 9\n",
        "plt.figure(figsize=(2 * N, 5))\n",
        "for i, image in enumerate(dd.images[:N], 1):\n",
        "    plt.subplot(1, N, i)\n",
        "    plt.imshow(image, cmap=\"gray\")"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "print(dd.images[0])\n",
        "print(dd.images[0].shape)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Here creating a feature vector is just \"flattening\" the matrix by concatenating the rows to one long vector:"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
        "print(dd.images[0].flatten())"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Other examples: text"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 14,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "[0 1 2 0 1 1]\n"
         ]
        }
       ],
       "source": [
        "from sklearn.feature_extraction.text import CountVectorizer\n",
        "from itertools import count\n",
        "\n",
        "# map words to index in created vector:\n",
        "vocabulary = [\"like\", \"dislike\", \"american\", \"italian\", \"beer\", \"pizza\"]\n",
        "\n",
        "vectorizer = CountVectorizer(vocabulary=dict(zip(vocabulary, count())))\n",
        "\n",
        "# crate count vector for a pice of text:\n",
        "vector = vectorizer.fit_transform([\"I dislike american pizza. But american beer is nice\"]).toarray()[0]\n",
        "print(vector)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Machine learning taxonomy"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": []
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": []
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": []
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": "Python 3",
       "language": "python",
       "name": "python3"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
       "version": "3.6.6"
      }
     },
     "nbformat": 4,
     "nbformat_minor": 2
    }