Log in

Sign up for our mailing list!

When & Where
Wed, April 20, 2016 - 10:00 AM to 12:00 PM
D-Lab: Convening Room (356 Barrows Hall)

Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. The Super Learner algorithm, also called "stacking", learns the optimal combination of the base learner fits. We will present several examples of how to improve model performance using the SuperLearner, subsemble and h2oEnsemble R packages.

The SuperLearner R package is the original Super Learner implementation and probably still the most widely used, however it does not scale easily to big datasets. The h2oEnsemble R package is an implementation which uses distributed base learning algorithms (including Random Forest and Deep Neural Nets) from the open source machine learning platform, H2O. The subsemble R package is a general subset ensemble prediction method, which partitions the training data into subsets, fits base learning algorithms on each subset, and uses a unique form of k-fold cross-validation to output a prediction function that combines the subset-specific fits.

R scripts demonstrating how to use each of these packages will be made available on GitHub for attendees to follow along on their laptops.

Prior knowledge: Familiarity with R is recommended. Some basic familiarity with topics in machine learning is also recommended. Examples topics are: classification, regression, training set, test set, cross-validation, ensembles, etc. Some of the basic concepts are explained in this tutorial.

Technology requirements:

  • Any operating system: Linux, OS X or Windows
  • Java 7 or 8 is required for h2oEnsemble
  • Install R
  • The SuperLearner and subsemble packages can be installed from CRAN, and we recommend installing the "h2o" R package from here (although it can also be installed from CRAN)


Training Host: 
D-lab Facilitator: 
Trinetta Chong
Format Detail: 
Lecture and hands-on follow-along