Data Sources

CANCELED: Python Data Wrangling and Manipulation with Pandas

November 29, 2022, 3:00pm
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

Python Web Scraping

June 26, 2023, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Covidence: Getting Started

February 29, 2024, 12:00pm
Covidence, a web-based tool licensed by the UC Berkeley Library, helps with your systematic and other literature reviews, which are popular processes to summarize and synthesize literature in your topic of interest. Covidence helps you organize and track progress on your review, from search results to extraction. This interactive workshop will take you through how to use Covidence. How to add reviewers or make changes mid-review, how to develop exclusion criteria, and how to get help will be covered. There will be plenty of time for Q & A during this session; you are welcome to raise questions about your specific review or review process.

Excel Data Analysis: Introduction

May 23, 2022, 9:00am
This is a three-hour introductory workshop that will provide an overview of Excel, with no prior experience assumed. Attendees will learn how to use functions for handling data and making calculations, how to build charts and pivot tables, and more.

Python Web APIs

March 14, 2023, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Propensity Score Matching for Causal Inference: Creating Data Visualizations to Assess Covariate Balance in R

June 10, 2024
by Sharon Green. Although some people consider randomized experiments the gold standard, in many cases, it would be highly unethical to assign individuals to harmful exposures to measure their effects. Modern causal inference techniques help scientists to estimate treatment effects using observational data. In particular, propensity score matching helps scientists estimate causal effects using observational data by matching individuals so that the “treatment” and “control” groups are balanced on measured covariates. After implementing propensity score matching, data visualizations make it easier to assess the quality of the matches before estimating effects. This blog post is a tutorial for implementing propensity score matching and creating data visualizations to assess covariate balance–that is, visually assessing whether the matched individuals are balanced with respect to measured covariates.

Sand Mining - Plugging a Critical Data Gap

May 14, 2024
by Suraj Nair. Excessive sand mining is causing a global ecological crisis. In this blog post, I present why sand mining is one of the most pressing challenges facing the planet, and why persistent data gaps hinder accountability and monitoring. I also discuss an ongoing research project of mine where we combine freely available satellite imagery and machine learning models to build open-source sand mine detection tools that can plug some of these data gaps.

Tactics for Text Mining non-Roman Scripts

April 15, 2024
by Hilary Faxon, Ph.D. & Win Moe. Non-Roman scripts pose particular challenges for text mining. Here, we reflect on a project that used text mining alongside qualitative coding to understand the politicization of online content following Myanmar’s 2021 military coup.

What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024
by Anna Björklund. Vowels are all around us. Mainstream US English has around twelve unique vowels. How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

Creating the Ultimate Sweet

January 30, 2024
by Emma Turtelboom. What is the best Halloween candy? In this blog post, we will identify attributes of popular sweets and create a model to understand how these attributes influence the popularity of the sweet. We’ll discuss alternative model approaches and potential drawbacks, as well as caveats to interpreting the predictions of our model.