Quantitative Analysis

R Machine Learning with tidymodels: Parts 1-2

February 24, 2025, 3:00pm

Machine learning often evokes images of Skynet, self-driving cars, and computerized homes. However, these ideas are less science fiction as they are tangible phenomena that are predicated on description, classification, prediction, and pattern recognition in data. During this two part workshop, we will discuss basic features of supervised machine learning algorithms including k-nearest neighbor, linear regression, decision tree, random forest, boosting, and ensembling using the tidymodels framework. To social scientists, such methods might be critical for investigating evolutionary relationships, global health patterns, voter turnout in local elections, or individual psychological diagnoses.

Read more about R Machine Learning with tidymodels: Parts 1-2

Python Data Wrangling and Manipulation with Pandas: Parts 1-2

February 10, 2025, 2:00pm

Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

Read more about Python Data Wrangling and Manipulation with Pandas: Parts 1-2

What are Time Series Made of?

December 10, 2024

Bruno Smaniotto

by Bruno Smaniotto. Trend-cycle decompositions are statistical tools that help us understand the different components of Time Series – Trend, Cycle, Seasonal, and Error. In this blog post, we will provide an introduction to these methods, focusing on the intuition behind the definition of the different components, providing real-life examples and discussing applications.

Read more about What are Time Series Made of?

A Recipe for Reliable Discoveries: Ensuring Stability Throughout Your Data Work

November 19, 2024

Jaewon Saw

by Jaewon Saw. Imagine perfecting a favorite recipe, then sharing it with others, only to find their results differ because of small changes in tools or ingredients. How do you ensure the dish still reflects your original vision? This challenge captures the principle of stability in data science: achieving acceptable consistency in outcomes relative to reasonable perturbations of conditions and methods. In this blog post, I reflect on my research journey and share why grounding data work in stability is essential for reproducibility, adaptability, and trust in the final results.

Read more about A Recipe for Reliable Discoveries: Ensuring Stability Throughout Your Data Work

Exploring Rental Affordability in the San Francisco Bay Area Neighborhoods with R

November 5, 2024

Taesoo Song

by Taesoo Song. Many American cities continue to face severe rental burdens. However, we rarely examine rental affordability through the lens of quantitative data. In this blog post, I demonstrate how to download and visualize rental affordability data for the San Francisco Bay Area using R packages like `tidycensus` and `sf`. This exercise shows that mapping census data can be a straightforward and powerful way to understand the spatial patterns of housing dynamics and can offer valuable insights for research, policy, and advocacy.

Read more about Exploring Rental Affordability in the San Francisco Bay Area Neighborhoods with R

Python Web Scraping

October 24, 2024, 2:00pm

In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Read more about Python Web Scraping

Python Web APIs

October 22, 2024, 2:00pm

In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Read more about Python Web APIs

Leveraging Large Language Models for Analyzing Judicial Disparities in China

October 8, 2024

Nanqin Ying

by Nanqin Ying. This study analyzes over 50 million judicial decisions from China’s Supreme People’s Court to examine disparities in legal representation and their impact on sentencing across provinces. Focusing on 290 000 drug-related cases, it employs large language models to differentiate between private attorneys and public defenders and assess their sentencing outcomes. The methodology combines advanced text processing with statistical analysis, using clustering to categorize cases by province and representation, and regression models to isolate the effect of legal representation from factors like drug quantity and regional policies. Findings reveal significant regional disparities in legal access driven by economic conditions, highlighting the need for reforms in China’s legal aid system to ensure equitable representation for marginalized groups and promote transparent judicial data for systemic improvements.

Read more about Leveraging Large Language Models for Analyzing Judicial Disparities in China

R Machine Learning with tidymodels: Parts 1-2

October 14, 2024, 1:00pm

Read more about R Machine Learning with tidymodels: Parts 1-2

Python Data Wrangling and Manipulation with Pandas

October 10, 2024, 2:00pm

Read more about Python Data Wrangling and Manipulation with Pandas

« first View: Taxonomy term
‹ previous View: Taxonomy term
1 of 8 View: Taxonomy term
2 of 8 View: Taxonomy term (Current page)
3 of 8 View: Taxonomy term
4 of 8 View: Taxonomy term
5 of 8 View: Taxonomy term
6 of 8 View: Taxonomy term
7 of 8 View: Taxonomy term
8 of 8 View: Taxonomy term
next › View: Taxonomy term
last » View: Taxonomy term