Data Sources

Seeing Behavior in Everyday Data

December 10, 2025
by Skyler Chen. This post discusses how my training in data science changed the way I think about behavioral research. I share how simply exploring everyday datasets and noticing small, unexpected patterns can spark new research questions, and how archival data and experiments each offer distinct yet complementary insights into how people make judgments and decisions. I also highlight the growing set of tools that help us understand behavior in richer ways.

Digitization of Historical Maps in the Age of AI

December 3, 2025
by Elena Stacy. Researchers today increasingly have access to a wealth of tools to streamline or automate labor-intensive data processing and generation tasks. When it comes to mapping, progress has been slower. This blog details the author's experience tackling the digitization of a historical map in the age of AI.

A Practical Guide to Shift-Share Instruments (and What I Learned Replicating the China Shock)

November 26, 2025
by Jiayu Lai. Shift-share instruments are among the most widely used tools in applied economics, appearing in labor, trade, immigration, and policy evaluation research. But despite their popularity, many researchers still use them as black boxes — and risk invalid instruments as a result. In this blog post, I unpack how shift-share IVs actually work, why their validity depends on both the “shifts” and the “shares,” and what practical steps researchers should take to check assumptions. I also walk through how I used the Borusyak–Hull–Jaravel (2022, 2025) framework to reproduce the seminal Autor, Dorn, and Hanson (2013) China shock analysis.

Teng-Jui (Owen) Lin

Consulting Drop-In Hours: By appointment only

Consulting Areas: Bionanotechnology, Chemistry, Data Curation, Data Sources, Data Visualization, Databases and SQL, HTML / CSS, Javascript, LaTeX, Machine Learning, MATLAB, Meta-Analysis, Python, Regression Analysis, SQL, Web Scraping

Quick-tip: the fastest way to speak to a consultant is to first ...

John Cherry

Consulting Drop-In Hours: By appointment only

Consulting Areas: ArcGIS Desktop - Online or Pro, Cluster Analysis, Data Sources, Data Visualization, Excel, Geospatial Data: Maps and Spatial Analysis, GIS (ArcGIS Pro, QGIS); spatial data analysis and visualization, Google Earth Engine, Mixed Methods, Public health data analysis; infectious disease mapping; rural and global health applications of GIS, Experimental Design, Spatial Statistics, Survey Sampling

Quick-tip: the fastest way to speak to a consultant is to first...

Carl Illustrisimo

Consulting Drop-In Hours: By appointment only

Consulting Areas: Bash or Command Line, Cluster Analysis, Data Sources, Data Visualization, Digital Humanities, Excel, Git or GitHub, Javascript, LaTeX, Machine Learning, Natural Language Processing (NLP), Python, Regression Analysis, RStudio, SQL, Text Analysis

Quick-tip: the fastest way to speak to a consultant is to first ...

Aidan Lee

Consulting Drop-In Hours: By appointment only

Consulting Areas: ArcGIS Desktop - Online or Pro, Bayesian Methods, Causal Inference, Cluster Analysis, Data Sources, Data Visualization, Databases and SQL, Digital Health, Excel, Experimental Design, Geospatial Data: Maps and Spatial Analysis, Git or GitHub, LaTeX, Machine Learning, Means Tests, Mixed Methods, Natural Language Processing (NLP), OCR, Python, Qualtrics, R, Regression Analysis, Research Design, Research Planning, RStudio, RStudio Cloud, SAS, Software Output Interpretation, SPSS, SQL,...

John Louis-Strakes Lopez

Postdoctoral Scholar
Berkeley School of Education

John Louis-Strakes Lopez is a Data Science Education postdoctoral scholar. He recently received his PhD in Education from University of Caifornia, Irvine. John’s work looks at student epistemological development within data science contexts. He is also interested in designing -and studying artificial intelligence and playful learning technologies for learning. John serves as a co-chair for the International Learning Sciences Student Association.

Beyond work, you will find John reading at a local coffee shop or eating a warm bowl of Pho.

In Silico Approach to Mining Viral Sequences from Bulk RNA-Seq Data

October 28, 2025
by Carly Karrick. Viruses play important roles in evolution and influence ecosystems and host health. However, isolating and studying them can be difficult. In lieu of using resource-intensive methods to concentrate viruses into a “virome,” bulk sequencing methods include data from all biological entities present in a sample. In this tutorial, we explore an approach to mine viral sequences from publicly available bulk RNA-Seq data. The output from this analysis paves the way for future statistical analyses comparing viral communities in different contexts. This approach can be applied to other datasets, including studies of human health.

Python Data Wrangling and Manipulation with Pandas

October 19, 2021, 10:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.