Python Web Scraping

April 5, 2022, 1:00pm to 4:00pm

Trying to register, but not affiliated with the UCB campus? If you are from Berkeley Lab (LBL), UCSF, or CZ Biohub, please register via our partner portals here.

If you are from the UCB campus there's no more waitlist! But after registering above, please do fill out the affiliations form if you have not done so at least once before: https://dlab.berkeley.edu/affiliations

Location: Remote via Zoom. Link will be sent on the morning of the event.

Recordings: This D-Lab workshop will be recorded and made available to UC Berkeley participants for a limited time. Your registration for the event indicates your consent to having any images, comments and chat messages included as part of the video recording materials that are made available.

Date & Time: This workshop runs from 1pm-4pm on Tuesday, April 5.

Start Time: D-Lab workshops start 10 minutes after the scheduled start time (“Berkeley Time”). We will admit all participants from the waiting room at that time.

Description

In this workshop, we cover how to extract data from the web using Python. We focus on two approaches to extracting data from the web: leveraging application programming interfaces (APIs) and web scraping.

APIs are often official services offered by companies and other entites, which allow one to directly query their servers in order to retrieve their data. When APIs are not available, one can turn to web scraping, which requires downloading a webpage's source code and sifting through the material to extract the desired data. This workshop demonstrates both approaches, their advantages and disadvantages, and how to use both responsibly.

Basic familiarity with Python is assumed. Understanding the material in the Python Fundamentals and Python Data Wrangling workshops highly recommended. We additionally recommend a basic understanding of HTML and CSS.

Topics Covered

  • How the web works
  • Accessing databases via RESTful APIs
  • Using a 3rd party Python package to query data from a API
  • HTML / CSS
  • Webscraping with Beautiful Soup

Requirements

We will assume a basic knowledge of Python. If you've taken the D-Lab's Python Intensive, that should be sufficient.

Prerequisites: D-Lab’s Python Fundamental introductory series or equivalent knowledge.

GitHub Repository: https://github.com/dlab-berkeley/Python-Web-Scraping

Software Requirements:Installation Instructions for Python Anaconda

Is Python Not working on your laptop? Attend the workshop anyway, we can provide you with a cloud-based solution until you figure out the problems with your local installation.

Feedback: After completing the workshop, please provide us feedback using this form

Questions? Email: dlab-frontdesk@berkeley.edu