In classifiers overview: fixes and updates for scikit-learn developments

  • mention in boosting: Histogram-based Gradient Boosting Classification/Regression Tree: HistGradientBoostingClassifier/Regressor; this estimator is much faster than GradientBoostingClassifier/Regressor for big datasets (n_samples >= 10 000).
  • update classifiers info on DT pruning (add example?): tree pruning based on minimal cost-complexity via ccp_alpha=0.05 kwarg
  • rm n_jobs if used in KMeans estimator (re-worked - significantly faster and more stable, but also uses OpenMP so n_jobs has no effect (instead need to set OMP_NUM_THREADS=... env var)
  • mention stacking in ensemble methods
    • StackingClassifier and StackingRegressor: a stack of estimators with a final classifier or a regressor. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator. Base estimators are fitted on the full X while the final estimator is trained using cross-validated predictions of the base estimators using cross_val_predict.
    • good overview/intro: https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205
Edited by Ghost User