Data Science

CANCELED: Python Data Wrangling and Manipulation with Pandas

November 29, 2022, 3:00pm
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

CANCELED: Python Text Analysis: Word Embeddings

November 17, 2022, 12:00pm
How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text, because it does not utilize word context.
Registration is unavailable.

CANCELED: Python Text Analysis: Topic Modeling

November 15, 2022, 12:00pm
In this part, we study unsupervised learning of text data. This is a stand alone work that builds from the two-part text analysis series.
Registration is unavailable.

CANCELED: Python Machine Learning Fundamentals: Parts 1-2

November 14, 2022, 4:00pm
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.
Registration is unavailable.

Exploring Population Data with IPUMS

November 8, 2022
Exploring Population Data with IPUMS

IPUMS logo with feature icons

Last month, demographer and historian Steve Ruggles was awarded a prestigious MacArthur Foundation Fellowship for his work developing IPUMS—a harmonized database of individual and family...

Institutional Review Board (IRB) Fundamentals

November 7, 2022, 12:00pm
Are you starting a research project at UC Berkeley that involves human subjects? If so, one of the first steps you will need to take is getting IRB approval.

Python Fundamentals: Parts 1-4

November 7, 2022, 11:00am
This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Python Text Analysis Fundamentals: Parts 1-2

November 1, 2022, 12:00pm
This two-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

Twitter Text Analysis: A Friendly Introduction

October 25, 2022

Text analysis techniques, including sentiment analysis, topic modeling, and named entity recognition, have been increasingly used to probe patterns in a variety of text-based documents, such as books, social media posts, and others. This blog post introduces Twitter text analysis, but is not intended to cover all of the aforementioned topics. The tutorial is broken down into two parts. In this very first post, I will give a step-by-step guide of how to use Python and Pandas to explore Twitter data In the second post, I will introduce Language...

Disaggregating Race and Ethnicity Categories in Census Data

November 1, 2022

The collection of race and ethnicity data by the United States Census Bureau has a long, complex, and problematic history. The Census claims that their racial categories generally reflect a social definition of race recognized in America, adhering to guidelines set by the U.S. Office of Management and Budget. In 1900, the Census recognized five racial categories: White, Black, Chinese, Japanese, and American Indian. Today, the Census collects more...