Newer
Older
"# Recall: ?LogisticRegression\n",
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"classifier.fit(input_features, labels)\n",
"predicted_labels = classifier.predict(input_features)\n",
"print(len(labels), \"examples\")\n",
"print(sum(predicted_labels == labels), \"labeled correctly\")\n",
"print(sum(predicted_labels == labels) / len(labels) * 100, \"% labeled correctly\")"
},
{
"cell_type": "markdown",
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<i class=\"fa fa-warning\"></i> <strong>Classifiers have hyper-parameters</strong>\n",
"All classifiers have hyper-parameters, e.g. the `C` we have seen before. It is an incident that both, `LogisticRegression` and `SVC`, have parameter named `C`. Beyond that some classifiers have more than one parameter, e.g. `SVC` also has a parameter `gamma`. But more about these details later.\n",
},
{
"cell_type": "markdown",
"source": [
},
{
"cell_type": "markdown",
"source": [
"Load and inspect the cannonical Fisher's \"Iris\" data set, which is included in `scikit-learn`: see [docs for `sklearn.datasets.load_iris`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html). What's conceptually diffferent?\n",
"Apply `LogisticRegression` or `SVC` classifiers. Is it easier or more difficult than classification of the beers data?\n",
{
"cell_type": "code",
"source": [
"from sklearn.datasets import load_iris\n",
"\n",
"data = load_iris()\n",
"\n",
"# labels as text\n",
"\n",
"# (rows, columns) of the feature matrix:\n",
},
{
"cell_type": "code",
"source": [
"# transform the scikit-learn data structure into a data frame:\n",
"df = pd.DataFrame(data.data, columns=data.feature_names)\n",
"df[\"class\"] = data.target\n",
"df.head()"
},
{
"cell_type": "code",
},
{
"cell_type": "code",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"import seaborn as sns\n",
"sns.set(style=\"ticks\")\n",
"\n",
"for_plot = df.copy()\n",
"\n",
"def transform_label(class_):\n",
" return data.target_names[class_]\n",
"\n",
"# seaborn does not work here if we use numeric values in the class\n",
"# column, or strings which represent numbers. To fix this we\n",
"# create textual class labels\n",
"for_plot[\"class\"] = for_plot[\"class\"].apply(transform_label)\n",
"sns.pairplot(for_plot, hue=\"class\", diag_kind=\"hist\");"
Mikolaj Rybinski
committed
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"solution"
]
},
"outputs": [],
"features = df.iloc[:, :-1]\n",
"labels = df.iloc[:, -1]\n",
"classifier.fit(features, labels)\n",
"predicted_labels = classifier.predict(features)\n",
"\n",
"print(len(labels), \"examples\")\n",
"print(sum(predicted_labels == labels), \"labeled correctly\")"
},
{
"cell_type": "markdown",
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",