01_introduction.ipynb

    "# Recall: ?LogisticRegression\n",
    "# ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "classifier = LogisticRegression(C=2)\n",
    "\n",
    "classifier.fit(input_features, labels)\n",
    "\n",
    "predicted_labels = classifier.predict(input_features)\n",
    "\n",
    "assert predicted_labels.shape == labels.shape\n",
    "print(len(labels), \"examples\")\n",
    "print(sum(predicted_labels == labels), \"labeled correctly\")\n",
    "print(sum(predicted_labels == labels) / len(labels) * 100, \"% labeled correctly\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<i class=\"fa fa-warning\"></i>&nbsp;<strong>Classifiers have hyper-parameters</strong>\n",
    "    \n",
    "All classifiers have hyper-parameters, e.g. the `C` we have seen before. It is an incident that both, `LogisticRegression` and `SVC`, have parameter named `C`. Beyond that some classifiers have more than one parameter, e.g. `SVC` also has a parameter `gamma`. But more about these details later.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional exercise"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load and inspect the cannonical Fisher's \"Iris\" data set, which is included in `scikit-learn`: see [docs for `sklearn.datasets.load_iris`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html). What's conceptually diffferent?\n",
    "\n",
    "Inspect the data using scatter plots.\n",
    "\n",
    "Apply `LogisticRegression` or `SVC` classifiers. Is it easier or more difficult than classification of the beers data?\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_iris\n",
    "\n",
    "data = load_iris()\n",
    "\n",
    "# labels as text\n",
    "print(data.target_names)\n",
    "\n",
    "# (rows, columns) of the feature matrix:\n",
    "print(data.data.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# transform the scikit-learn data structure into a data frame:\n",
    "df = pd.DataFrame(data.data, columns=data.feature_names)\n",
    "\n",
    "# add new column\n",
    "df[\"class\"] = data.target\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# SOLUTION STARTS HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "sns.set(style=\"ticks\")\n",
    "\n",
    "for_plot = df.copy()\n",
    "\n",
    "\n",
    "def transform_label(class_):\n",
    "    return data.target_names[class_]\n",
    "\n",
    "\n",
    "# seaborn does not work here if we use numeric values in the class\n",
    "# column, or strings which represent numbers. To fix this we\n",
    "# create textual class labels\n",
    "for_plot[\"class\"] = for_plot[\"class\"].apply(transform_label)\n",
    "sns.pairplot(for_plot, hue=\"class\", diag_kind=\"hist\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "features = df.iloc[:, :-1]\n",
    "labels = df.iloc[:, -1]\n",
    "\n",
    "# classifier = SVC()\n",
    "classifier = LogisticRegression(max_iter=200)\n",
    "classifier.fit(features, labels)\n",
    "\n",
    "predicted_labels = classifier.predict(features)\n",
    "\n",
    "assert predicted_labels.shape == labels.shape\n",
    "print(len(labels), \"examples\")\n",
    "print(sum(predicted_labels == labels), \"labeled correctly\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (C) 2019-2022 ETH Zurich, SIS ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "machine-learning-ws-2023",
   "language": "python",
   "name": "machine-learning-ws-2023"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}