In classifiers overview: fixes and updates for scikit-learn developments
-
mention in boosting: Histogram-based Gradient Boosting Classification/Regression Tree:
HistGradientBoostingClassifier/Regressor
; this estimator is much faster thanGradientBoostingClassifier/Regressor
for big datasets (n_samples >= 10 000
). -
update classifiers info on DT pruning (add example?): tree pruning based on minimal cost-complexity via
ccp_alpha=0.05
kwarg -
rm
n_jobs
if used in KMeans estimator (re-worked - significantly faster and more stable, but also uses OpenMP son_jobs
has no effect (instead need to setOMP_NUM_THREADS=...
env var) -
mention stacking in ensemble methods
-
StackingClassifier
andStackingRegressor
: a stack of estimators with a final classifier or a regressor. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator. Base estimators are fitted on the full X while the final estimator is trained using cross-validated predictions of the base estimators using cross_val_predict. - good overview/intro: https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205
-