Python Web Scraping

March 5, 2025, 10:00am to 12:00pm

REGISTRATION NOTES

Click register and then use your @berkeley.edu or @lbl.gov email address.
If you have trouble, you may need to log out of Zoom and log back in.
For help, read more here: https://dlab.berkeley.edu/zoom-troubleshooting-tips

Register

Location: Remote via Zoom. 

Recordings: This D-Lab workshop will be recorded and made available to UC Berkeley participants for a limited time. Your registration for the event indicates your consent to having any images, comments and chat messages included as part of the video recording materials that are made available.

Start Time: D-Lab workshops start 10 minutes after the scheduled start time (“Berkeley Time”). We will admit all participants from the waiting room at that time.

Date & Time: This workshop runs from 10am-12pm on:

  • Wednesday, March 5

Description

In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Web scraping is typically only done when Web APIs are not available. Platforms like Twitter, Reddit, or The New York Times offer APIs to retrieve data. If you want to learn how to use web APIs in Python, see D-Lab's Python Web APIs workshop.

We will assume a basic knowledge of Python. We recommend attending D-Lab's Python Fundamentals and Python Data Wrangling prior to this workshop. We additionally recommend a basic understanding of HTML and CSS.

Prerequisites: We recommend attending D-Lab's Python Fundamentals and Python Data Wrangling prior to this workshop. We additionally recommend a basic understanding of HTML and CSS.

Workshop Materials: https://github.com/dlab-berkeley/Python-Web-Scraping

Software Requirements: Installation Instructions for Python Anaconda

Is Python not working on your laptop? Attend the workshop anyway, we can provide you with a cloud-based solution until you figure out the problems with your local installation.

Questions? Email: dlab-frontdesk@berkeley.edu