Log in

Sign up for our weekly newsletter!

When & Where
Fri, November 4, 2016 - 2:00 PM to 4:00 PM
Barrows 356: D-Lab Convening Room

he focus of this workshop is machine learning using the h2o Python module. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster).

The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine.

H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others. The ability to create stacked ensembles, or "super learners," from a collection of supervised base learners is provided via the h2oEnsemble R package.

Python scripts with H2O machine learning code examples will be demoed live and made available on GitHub for attendees to follow along on their laptops.

Prior knowledge: Familiarity with Python is recommended. Some basic familiarity with topics in machine learning is also recommended. Examples topics are: classification, regression, training set, test set, cross-validation, etc. 


Training Host: 
D-lab Facilitator: 
Scott McGinnis
Format Detail: 
Participant Technology Requirement: 
Java 7 or 8 is required. The h2o Python module is required.
Log in to register for this training.