Log in

Sign up for our weekly newsletter!

When & Where
Wed, April 6, 2016 - 10:00 AM to 12:00 PM
D-Lab: Convening Room (356 Barrows Hall)

The focus of this workshop is machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster).

The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine.

H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others. The ability to create stacked ensembles, or "super learners," from a collection of supervised base learners is provided via the h2oEnsemble R package.

R and Python Jupyter notebooks with H2O machine learning code examples will be demoed live and made available on GitHub for attendees to follow along on their laptops.

Prior knowledge:

Familiarity with R or Python is recommended. Some basic familiarity with topics in machine learning is also recommended. Examples topics are: classification, regression, training set, test set, cross-validation, etc. Some of the basic concepts are explained in this tutorial

Technology requirements: 

Interested in learning more? Check out our related upcoming H2O workshop: How to Use the H2O Web GUI

Training Host: 
D-lab Facilitator: 
Trinetta Chong
Format Detail: 
Lecture and hands-on follow-along