Log in

Sign up for our mailing list!

When & Where
Fri, November 7, 2014 - 4:00 PM to 5:15 PM
Evans 1011

The Statistics Department is offering a two-session workshop on distributed computing using Spark. Spark is the Berkeley AmpLab's variant on Hadoop that allows for MapReduce calculations to be done in computer memory when possible, speeding computation.

This first session provides an introduction to distributed file systems, Map Reduce, basic data processing using Spark

The instructor be setting up an Amazon account with free credits that participants can use to start up their own virtual Linux cluster to try Spark on.

If you want to get an account, please fill out this form: https://docs.google.com/a/berkeley.edu/forms/d/1HP8LUXtLHqedkrgmqMeQ7RSt...

Materials will be available at https://github.com/berkeley-scf/spark-workshop-2014 (under construction). No prior knowledge is assumed. Some familiarity with Python will be helpful as we'll run Spark via Python, but I think you'll get something out of it even if you're not familiar with Python syntax.

Training Host: 
D-lab Facilitator: 
Jon Stiles
Participant Technology Requirement: