Data Sources

Leveraging Large Language Models for Analyzing Judicial Disparities in China

October 8, 2024
by Nanqin Ying. This study analyzes over 50 million judicial decisions from China’s Supreme People’s Court to examine disparities in legal representation and their impact on sentencing across provinces. Focusing on 290 000 drug-related cases, it employs large language models to differentiate between private attorneys and public defenders and assess their sentencing outcomes. The methodology combines advanced text processing with statistical analysis, using clustering to categorize cases by province and representation, and regression models to isolate the effect of legal representation from factors like drug quantity and regional policies. Findings reveal significant regional disparities in legal access driven by economic conditions, highlighting the need for reforms in China’s legal aid system to ensure equitable representation for marginalized groups and promote transparent judicial data for systemic improvements.

Anna Björklund

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
Linguistics

I am a fifth-year PhD student in the Department of Linguistics with an areal interest in the Wintuan languages, traditionally spoken in the northern Sacramento Valley and now undergoing revitalization. My primary research interests are in leveraging archival recordings for the phonetic analysis of these under-documented languages, as well as designing tools to assist in their revitalization. I have worked as a linguistic consultant for the Paskenta Band of Nomlaki Indians since 2020 and the Wintu Tribe of Northern California since 2022. I received my MA in linguistics from UC...

Alex Ramiller

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
City and Regional Planning

I am a PhD Candidate in City and Regional Planning. My research focuses on the use of large administrative datasets to study residential mobility, neighborhood change, and housing access. I received a Master in Geography from the University of Washington and a Bachelor's in Economics and Geography from Macalester College. I have also consulted on analytical projects for several organizations including the San Francisco Federal Reserve Bank, PolicyLink, and the City of Seattle.

Python Web Scraping

October 24, 2024, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Python Web APIs

October 22, 2024, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Excel Data Analysis: Introduction

October 2, 2024, 2:00pm
This is a three-hour introductory workshop that will provide an overview of Excel, with no prior experience assumed. Attendees will learn how to use functions for handling data and making calculations, how to build charts and pivot tables, and more.

Excel Data Analysis: Charts, Pivot Tables, and VLOOKUP

October 7, 2024, 2:00pm
This three-hour workshop will cover charts in more detail, review pivot tables, and the widely-used VLOOKUP function. We recommend first taking the introductory workshop Excel Data Analysis: Introduction.

Stephanie Andrews

Availability: By appointment only

Consulting Areas: Python, SQL, HTML / CSS, Javascript, APIs, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Digital Humanities, Machine Learning, Natural Language Processing, Software Tools, Text Analysis, Web Scraping, Bash or Command Line, Excel, Git or Github, Tableau

Emma Lasky

Availability: By appointment only

Consulting Areas: Python Programming, R Programming, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Geospatial Data, Maps & Spatial Analysis, Mixed Methods, Regression Analysis, ArcGIS Desktop, Online or Pro, Excel, Git or Github, QGIS, RStudio, RStudio Cloud

Anusha Bishop

Availability: By appointment only

Consulting Areas: Python, R, Cloud & HPC Computing, Data Sources, Data Visualization, Geospatial Data, Maps & Analysis, Machine Learning, Research Design, Cluster analysis, Experimental design, Hierarchical Models, High dimensional statistics, Means Tests, Nonparametric methods, Regression Analysis, Software Output Interpretation, Spatial statistics, Bash or Command Line, Excel, Git or Github, RStudio