Data Science

Python Text Analysis Fundamentals: Parts 1-2

November 1, 2022, 12:00pm
This two-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

Disaggregating Race and Ethnicity Categories in Census Data

November 1, 2022

The collection of race and ethnicity data by the United States Census Bureau has a long, complex, and problematic history. The Census claims that their racial categories generally reflect a social definition of race recognized in America, adhering to guidelines set by the U.S. Office of Management and Budget. In 1900, the Census recognized five racial categories: White, Black, Chinese, Japanese, and American Indian. Today, the Census collects more...

Python Web Scraping & APIs

November 2, 2022, 3:00pm
In this workshop, we cover how to extract data from the web using Python. We focus on two approaches to extracting data from the web: leveraging application programming interfaces (APIs) and web scraping.

Stata Fundamentals: Parts 1-3

October 26, 2022, 9:00am
This workshop is a three-part introductory series that will teach you Stata from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the Stata software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

Python Data Wrangling and Manipulation with Pandas

October 24, 2022, 3:00pm
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

R Fundamentals: Parts 1-4

October 18, 2022, 1:00pm
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

Getting Started with Surveys

October 18, 2022
Getting Started With Surveys

Surveys can be an extremely useful tool for gathering information from individuals and groups. They are used across disciplines and industries to help researchers learn more about populations and gain actionable insights. I’ve done extensive survey research in nonprofit and industry settings, using these data to improve programs, make changes to technology, design communications plans, make content acquisition decisions, and much more. In this blog post, I’ll focus on one of the most important parts of survey research — planning....

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

Python Fundamentals: Parts 1-4

October 11, 2022, 3:00pm
This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Python Machine Learning Fundamentals: Parts 1-2

October 4, 2022, 2:00pm
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.