Log in

Sign up for our mailing list!

When & Where
Fri, November 14, 2014 - 4:00 PM to 5:15 PM
Evans 1011

The Statistics Department is offering a two-session workshop on distributed computing using Spark.  Spark is the Berkeley AmpLab's variant on Hadoop that allows for Map-Reduce calculations to be done in computer memory when possible, speeding computation.

This is the second of the two sessions,  and will cover basic model fitting using Spark, including linear models, GLMs, and Lasso. (Also simulation in Spark, plus time for collective discussion.)

The instrcutor will be setting up an Amazon account with free credits that participants can use to start up their own virtual Linux cluster to try Spark on. If you want to get an account, please fill out this form.

Materials will be available at https://github.com/berkeley-scf/spark-workshop-2014 (under construction).

I will assume no prior knowledge. Some familiarity with Python will be helpful as we'll run Spark via Python, but I think you'll get something out of it even if you're not familiar with Python syntax.

No need to register in advance!

Training Host: