diff --git a/05_classifiers_overview.ipynb b/05_classifiers_overview.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..19d31bf1ad147ed9e2f28dd2b7d7f8739c62e6c5 --- /dev/null +++ b/05_classifiers_overview.ipynb @@ -0,0 +1,39 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 5: An overview of classifiers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/05_preprocessing_pipelines_and_hyperparameter_optimization.ipynb b/06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb similarity index 98% rename from 05_preprocessing_pipelines_and_hyperparameter_optimization.ipynb rename to 06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb index de86c6fb8f08b01236ebfac58a9acbd57d38101c..15f0dfc051b99ad0ae18ebd802eccede6735edfc 100644 --- a/05_preprocessing_pipelines_and_hyperparameter_optimization.ipynb +++ b/06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb @@ -1,5 +1,12 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 6: Preprocessing pipelines and hyperparameters optmization" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -341,7 +348,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.7.2" } }, "nbformat": 4, diff --git a/course_layout.md b/course_layout.md index 2c6f3eaf0722fc4f8b2fd7c86b4dfe5dedbd868b..fca5da46b96562473c68690e052bfb295ad5a7c4 100644 --- a/course_layout.md +++ b/course_layout.md @@ -39,7 +39,7 @@ TBD: prepare coding session ## Part 1: Introduction (UWE) - What is machine learning ? - + - learning from examples - working with hard to understand data. - automatation @@ -47,39 +47,39 @@ TBD: prepare coding session - What are features / samples / feature matrix ? - always numerical / categorical vectors - examples: beer, movies, images, text to numerical examples - -- Learning problems: - + +- Learning problems: + - unsupervised: - + - find structure in set of features - beers: find groups of beer types - + - supervised: - + - classification: do I like this beer ? example: draw decision tree - - + + ## Part 2a: supervised learning: classification Intention: demonstrate one / two simple examples of classifiers, also introduce the concept of decision boundary - + - idea of simple linear classifier: take features, produce real value ("uwes beer score"), use threshold to decide -> simple linear classifier (linear SVM e.g.) -> beer example with some weights - + - show code example with logistic regression for beer data, show weights, plot decision function ### Coding session: - change given code to use a linear SVM classifier - + - use different data (TBD) set which can not be classified well with a linear classifier - tell to transform data and run again (TBD: how exactly ?) - + ## Part 2b: supervised learning: regression (TBD: skip this ?) @@ -130,7 +130,7 @@ Intention: accuracy is usefull but has pitfalls - how to measure accuracy ? - (TDB: skip ?) regression accuracy - - + - - classifier accuracy: - confusion matrix - accurarcy @@ -138,7 +138,7 @@ Intention: accuracy is usefull but has pitfalls e.g. diagnose HIV - precision / recall - ROC ? - + - exercise: do cross val with other metrics ### Coding session @@ -152,29 +152,7 @@ Intention: accuracy is usefull but has pitfalls # Day 2 - -## Part 5: pipelines / parameter tuning with scikit-learn - -- Scicit learn api: recall what we have seen up to now. -- pipelines, preprocessing (scaler, PCA) -- cross validatioon -- parameter tuning: grid search / random search. - - -### Coding session - -- build SVM and LinearRegression crossval pipelines for previous examples -- use PCA in pipeline for (+) to improve performance -- find optimal SVM parameters -- find optimal pca components number - -### Coding par - -Planning: stop here, make time estimates. - - - -## Part 6: classifiers overview +## Part 5: classifiers overview Intention: quick walk throught throug reliable classifiers, give some background idea if suitable, let them play withs some incl. modification of parameters. @@ -188,7 +166,14 @@ diagram. - Random forests - Gradient Tree Boosting -show decision surfaces of these classifiers on 2d examples. +topics to include: + +- interoperability of results (in terms features importance, e.g. SVN w/ hig deg poly + kernel) +- some rules of thumbs: don't use KNN classifiers for 10 or more dimensions (why? paper + link) +- show decision surfaces for diff classifiers (extend exercise in sec 3 using + hyperparams) ### Coding session @@ -197,10 +182,26 @@ show decision surfaces of these classifiers on 2d examples. - MNIST example -## Part 7: Start with neural networks. .5 day +## Part 6: pipelines / parameter tuning with scikit-learn + +- Scikit-learn API: recall what we have seen up to now. +- pipelines, preprocessing (scaler, PCA) +- cross validation +- parameter tuning: grid search / random search. + +### Coding session +- build SVM and LinearRegression crossval pipelines for previous examples +- use PCA in pipeline for (+) to improve performance +- find optimal SVM parameters +- find optimal pca components number + + +## Part 7: Start with neural networks. .5 day +## Planning +Stop here, make time estimates. diff --git a/index.ipynb b/index.ipynb index eeab403a24bb7ef60c074dbb9d7fa686b146540f..2f2e67ea8cf903ce67753f3e022f49ab8a9575e5 100644 --- a/index.ipynb +++ b/index.ipynb @@ -1,5 +1,17 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<div class=\"alert alert-block alert-danger\"><p>\n", + "<strong>TODOs</strong>\n", + "<ol>\n", + "<li>Write script which removes the solution proposals (cells starting with <code>#SOLUTION</code>) and creates a new notebook.</li>\n", + "</ol>\n", + "</p></div>\n" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -13,12 +25,34 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "<div class=\"alert alert-block alert-danger\"><p>\n", - "<strong>TODOs</strong>\n", - "<ol>\n", - "<li>Write script which removes the solution proposals (cells starting with <code>#SOLUTION</code>) and creates a new notebook.</li>\n", - "</ol>\n", - "</p></div>\n" + "# Course: Introduction to Machine Learning with Python\n", + "\n", + "<div class=\"alert alert-block alert-warning\">\n", + " <p><i class=\"fa fa-warning\"></i> <strong>Goal</strong></p>\n", + " <p>Quickly get your hands dirty with Machine Learning and know what your doing.<p>\n", + "</div>\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What will you learn?\n", + "\n", + "* Basic concepts of Machine Learning (ML).\n", + "* General overview of supervised learning and related methods.\n", + "* How to quickly start with ML using `scikit-learn` Python library." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What will you NOT learn?\n", + "\n", + "* How to program with Python.\n", + "* How exactly ML methods work.\n", + "* Unsupervised learning methods." ] }, { @@ -30,6 +64,10 @@ "<ol>\n", " <li><a href=\"01_introduction.ipynb\">Introduction</a></li>\n", " <li><a href=\"02_classification.ipynb\">Classification</a></li>\n", + " <li><a href=\"03_overfitting_and_cross_validation.ipynb\">Overfitting and cross-validation</a></li>\n", + " <li><a href=\"04_measuring_quality_of_a_classifier.ipynb\">Metrics for evaluating the performance</a></li>\n", + " <li><a href=\"05_classifiers_overview.ipynb\">An overview of classifiers</a></li>\n", + " <li><a href=\"06_preprocessing_pipelines_and_hyperparameter_optimization.ipynb\">Preprocessing pipelines and hyperparameters optmization</a></li>\n", " <li>...</li>\n", " \n", "</ol>"