Data Manipulation and Cleaning

Python Web APIs

February 8, 2024, 10:00am
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Python Web Scraping

February 15, 2024, 10:00am
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Tonya D. Lindsey, Ph.D.

Data Science Fellow
Institute of Governmental Studies (IGS)

Tonya D. Lindsey is a visiting scholar at the Institute of Governmental Studies and the project director of CRB Nexus: Where Policy Meets Research, an initiative of the California Research Bureau (CRB) at the California State Library. As project director of CRB Nexus, she is developing a community of practice space for California’s policy staff and public scholars. As a CRB senior researcher she uses her expertise in research methods to analyze a wide variety of policy questions at the request of legislators, the governor’s office, and their staff. She received her PhD in sociology...

Laura Schmahmann

Instructor
City and Regional Planning

I am a PhD Candidate within the Department of City and Regional Planning at UC Berkeley. My dissertation explores the political economy of warehouse development across California, focusing on two case studies - the Inland Empire and North San Joaquin Valley. I am also a Graduate Student Researcher within the Labor Management Partnerships team at the UC Berkeley Labor Center. I hold a Bachelor of Planning (Honours Class 1) and Master of Philosophy (Planning and Urban Development) both from the University of New South Wales.

Exploratory Data Analysis in Social Science Research

November 14, 2023
by Kamya Yadav. Causal inference has become the dominant endeavor for many political scientists, often at the expense of good research questions and theory building. Returning to descriptive inference – the process of describing the world as it exists – can help formulate research questions worth asking and theory that is grounded in reality. Exploratory data analysis is one method of conducting descriptive inference. It can help social science researchers find empirical patterns and puzzles that motivate their research questions, test correlations between variables, and engage with the existing literature on a topic. In this blog post, I walk through results from exploratory data analysis I conducted for my dissertation project on political ambition of women.

Python Data Wrangling and Manipulation with Pandas

November 15, 2023, 9:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

Introduction to Item Response Theory

October 24, 2023
by Mingfeng Xue. Measurements (e.g., tests, surveys, questionnaires) are inevitably involved with various sources of errors. Among many psychometric theories, item response theory stands out for its capability of detailed analyses at the item level and its potential to reduce some of the measurement errors. This post first discussed the limitations of conventional summation and average, which give rise to the IRT models, and then introduced a basic form of the Rasch model, including expressions of the model, the assumptions underlying it, some of its advantages, and software packages. Some codes are also provided.

María Martín López

Data Science Fellow
Psychology

María Martín López is a PhD student in the Cognition area within the Department of Psychology. Her research relates to cognitive computational and quantitative models of individual differences in behaviors, thoughts, and emotions. She is particularly interested in how we can create and leverage novel algorithms to understand, measure, and predict processes relating to externalizing psychopathology (e.g. impulsivity, aggression, substance use). She answers these questions using a range of computational and quantitive models including AI, NLP, SEM, time series analysis, multi-level...

Using Forest Plots to Report Regression Estimates: A Useful Data Visualization Technique

October 17, 2023
by Sharon Green. Regression models help us understand relationships between two or more variables. In many cases, results are summarized in tables that present coefficients, standard errors, and p-values. Reading these can be a slog. Figures such as forest plots can help us communicate results more effectively and may lead to a better understanding of the data. This blog post is a tutorial on two different approaches to creating high-quality and reproducible forest plots, one using ggplot2 and one using the forestplot package.

Python Data Wrangling and Manipulation with Pandas

October 10, 2023, 10:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.