Data Sources

Enrique Valencia López

Data Science Fellow
Graduate School of Education

Enrique Valencia López is a PhD student in the Policy, Politics and Leadership cluster at the Graduate School of Education.His research interests relate to three broad areas: the stratification of education by gender, immigration status and ethnicity; the measurement of teacher working conditions and well-being; and education in Latin America.

Before coming to Berkeley, Enrique worked for Mexico’s National Institute for Educational Evaluation and Assessment (INEE) in both the Policy and Indicators area. During that time, he co-authored Mexico’s first report on the educational...

Chirag Manghani

Consultant
School of Information

Chirag is a 2nd year graduate at the I-School. Proficient in Python, Java, R, and SQL, he navigates software application development, machine learning and data science. His keen interest lies in data analysis and statistical methods, driving him to bridge theory and practice seamlessly. Chirag's dedication to excellence, adaptable mindset, and innate curiosity define him as a dynamic problem solver in the ever-evolving tech landscape.

Violet Davis

Data Science for Social Justice Senior Fellow 2024
MIDS

I am a Masters student studying Data Science with the School of Information. My research involves computational social science projects focused on social justice and equity.

Minding the Gaps: Pay Equity in California

July 9, 2024
by Tonya D. Lindsey, Ph.D. The gender pay gap continues to reflect that, on average, men outearn women. California is among the states with the smallest pay gaps (outpacing the national number at 13%) and is unique in that it enacted legislation aimed at eliminating pay gaps by sex and race categories. This blog post reflects on California’s pay gap as students study it in an undergraduate social statistics course. Independent variables indicate three theoretical frameworks: 1) human capital, 2) occupational segregation, and 3) discrimination. While the work students do is rigorous using a representative sample of full-time year-round California workers, there remains work to be done and caveats to the data and analyses.

Python Web APIs

October 26, 2023, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Finding Health Statistics and Data

March 17, 2022, 2:00pm
Participants in this workshop will learn about some of the issues surrounding the collection of health statistics, and will also learn about authoritative sources of health statistics and data. We will look at tools that let you create custom tables of vital statistics (birth, death, etc.), disease statistics, health behavior statistics, and more.

Finding Health Statistics and Data

November 2, 2022, 1:00pm
Participants in this workshop will learn about some of the issues surrounding the collection of health statistics, and will also learn about authoritative sources of health statistics and data. We will look at tools that let you create custom tables of vital statistics (birth, death, etc.), disease statistics, health behavior statistics, and more.

Excel Data Analysis: Introduction

June 21, 2023, 9:30am
This is a three-hour introductory workshop that will provide an overview of Excel, with no prior experience assumed. Attendees will learn how to use functions for handling data and making calculations, how to build charts and pivot tables, and more.

Excel Data Analysis: Introduction

January 29, 2024, 9:00am
This is a three-hour introductory workshop that will provide an overview of Excel, with no prior experience assumed. Attendees will learn how to use functions for handling data and making calculations, how to build charts and pivot tables, and more.

Python Data Wrangling and Manipulation with Pandas

October 19, 2021, 10:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.