Data Science

Python Machine Learning Fundamentals: Parts 1-2

November 19, 2024, 1:00pm
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Python Fundamentals: Parts 4-6

November 4, 2024, 8:00am
This three-part interactive workshop series teaches you intermediate programming Python for people with previous programming experience equivalent to our Python Fundamentals workshop. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

The Case for Including Disability in Social Science Demographics

October 15, 2024
by Mango Jane Angar. As we celebrate Disability Awareness Month at the D-Lab alongside the UC Berkeley scholarly community, how can we, as social scientists, individually promote accessibility and inclusion? To advance accessibility, we should focus on addressing the barriers faced by individuals with disabilities, using our research to provide insights for effective policy recommendations. Although most of us do not focus on disability-related issues, including disability as a demographic characteristic in our data collection can greatly enhance our understanding of diverse populations and improve the comprehensiveness of our analyses. This small step can contribute to broader efforts toward inclusion and social equity.

Qualtrics Fundamentals

October 25, 2024, 10:00am
Qualtrics is a powerful online tool available to Berkeley community members that can be used for a range of data collection activities. Primarily, Qualtrics is designed to make web surveys easy to write, test, and implement, but the software can be used for data entry, training, quality control, evaluation, market research, pre/post-event feedback, and other uses with some creativity.

Institutional Review Board (IRB) Fundamentals

October 17, 2024, 3:00pm
Are you starting a research project at UC Berkeley that involves human subjects? If so, one of the first steps you will need to take is getting IRB approval.

Python Web Scraping

October 24, 2024, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Python Web APIs

October 22, 2024, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Leveraging Large Language Models for Analyzing Judicial Disparities in China

October 8, 2024
by Nanqin Ying. This study analyzes over 50 million judicial decisions from China’s Supreme People’s Court to examine disparities in legal representation and their impact on sentencing across provinces. Focusing on 290 000 drug-related cases, it employs large language models to differentiate between private attorneys and public defenders and assess their sentencing outcomes. The methodology combines advanced text processing with statistical analysis, using clustering to categorize cases by province and representation, and regression models to isolate the effect of legal representation from factors like drug quantity and regional policies. Findings reveal significant regional disparities in legal access driven by economic conditions, highlighting the need for reforms in China’s legal aid system to ensure equitable representation for marginalized groups and promote transparent judicial data for systemic improvements.

Understanding Adolescent Ethnic-Racial Identity: A Latent Profile Approach

September 24, 2024
by Elaine Luo. As youth navigate an increasingly ethnoracially diverse society like the United States, their ethnic-racial identity (ERI) plays a crucial role in shaping various aspects of their development, including academic and psychosocial outcomes. In this post, I share insights from our recent study on adolescent ERI and youth adjustment. Using a person-centered approach, we identified four distinct ERI profiles: Strongly Diffused, Moderately Diffused, Balanced, and Achieved. Our findings revealed differences in educational motivation, school belonging, and expectations for discrimination across these profiles, highlighting the complexity of ERI development. Implications for caregivers, educators, and communities are also discussed.

Python Fundamentals: Parts 1-3

September 16, 2024, 2:00pm
This three-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.