Log in

Sign up for our weekly newsletter!

When & Where
Mon, October 31, 2016 - 10:00 AM to 1:00 PM
Barrows 356: D-Lab Convening Room

The focus of this workshop is machine learning using the h2o and h2oEnsemble R packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it is easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster).

The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine.

H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others. The ability to create stacked ensembles, or "super learners," from a collection of supervised base learners is provided via the h2oEnsemble R package.

R scripts with H2O machine learning code examples will be demoed live and made available on GitHub for attendees to follow along on their laptops.

Prior knowledge: Familiarity with R is recommended. Some basic familiarity with topics in machine learning is also recommended. Examples topics are: classification, regression, training set, test set, cross-validation, etc.

Training Host: 
D-lab Facilitator: 
Susan Grand
Format Detail: 
Participant Technology Requirement: 
Java 7 or 8 is required. The h2o R package is required.
Log in to register for this training.