Machine Learning

Need help with Machine Learning?

Visit Drop-in Hours or Schedule a Consultation: <link to an embedded google calendar OB widget or google form widget> 

Below are the consultant we have available with Machine Learning and other expertise listed.

Python Machine Learning Fundamentals: Parts 1-2

November 19, 2024, 1:00pm
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Leveraging Large Language Models for Analyzing Judicial Disparities in China

October 8, 2024
by Nanqin Ying. This study analyzes over 50 million judicial decisions from China’s Supreme People’s Court to examine disparities in legal representation and their impact on sentencing across provinces. Focusing on 290 000 drug-related cases, it employs large language models to differentiate between private attorneys and public defenders and assess their sentencing outcomes. The methodology combines advanced text processing with statistical analysis, using clustering to categorize cases by province and representation, and regression models to isolate the effect of legal representation from factors like drug quantity and regional policies. Findings reveal significant regional disparities in legal access driven by economic conditions, highlighting the need for reforms in China’s legal aid system to ensure equitable representation for marginalized groups and promote transparent judicial data for systemic improvements.

R Machine Learning with tidymodels: Parts 1-2

October 14, 2024, 1:00pm
Machine learning often evokes images of Skynet, self-driving cars, and computerized homes. However, these ideas are less science fiction as they are tangible phenomena that are predicated on description, classification, prediction, and pattern recognition in data. During this two part workshop, we will discuss basic features of supervised machine learning algorithms including k-nearest neighbor, linear regression, decision tree, random forest, boosting, and ensembling using the tidymodels framework. To social scientists, such methods might be critical for investigating evolutionary relationships, global health patterns, voter turnout in local elections, or individual psychological diagnoses.

Python Machine Learning Fundamentals: Parts 1-2

September 30, 2024, 9:00am
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Stephanie Andrews

Availability: By appointment only

Consulting Areas: Python, SQL, HTML / CSS, Javascript, APIs, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Digital Humanities, Machine Learning, Natural Language Processing, Software Tools, Text Analysis, Web Scraping, Bash or Command Line, Excel, Git or Github, Tableau

Stephanie Andrews

Consultant
Info & Data Science MIDS

Stephanie Andrews is currently studying data science in the MIDS program, having previously majored in Social Welfare as an undergraduate at Cal. After graduating, she worked as an advocate for survivors of gender-based violence, as a public policy analyst focusing on anti-trafficking initiatives, and as a software engineer for progressive and social impact organizations. She is now conducting research with the Human Rights Center's Investigations Lab, using OSINT and data science methods to investigate human rights violations.

Kurt Soncco Sinchi

Consultant
Civil Engineering

First generation student and looking to improve and apply Data Science core concepts into social impactful projects, as well as trying to leverage the information from previous cases for better insights of society. Focused on infrastructure and its impact under natural disasters.

Amanda Glazer

Instructor
Statistics

Amanda is a PhD candidate in the statistics department at Berkeley. Her research focuses on causal inference with applications in education, political science and sports. Previously she earned her Bachelor’s degree in mathematics and statistics, with a secondary in computer science, from Harvard.

Emily Grabowski

Senior Data Science Fellow, Senior Instructor, Senior Consultant
Linguistics

I am a Ph.D. student in Linguistics. My research interests include understanding how our speech production and speech perception systems constrain linguistic variation, especially as it applies to the larynx. I am also interested in integrating theoretical representations of language with speech. I approach this using a broad variety of tools/methodologies, including theoretical work, experiments, and modeling. Current projects include developing a computational tool to expedite the analysis of pitch and an online perception experiment on the relationship between pitch and perceived...

Chirag Manghani

Consultant
School of Information

Chirag is a 2nd year graduate at the I-School. Proficient in Python, Java, R, and SQL, he navigates software application development, machine learning and data science. His keen interest lies in data analysis and statistical methods, driving him to bridge theory and practice seamlessly. Chirag's dedication to excellence, adaptable mindset, and innate curiosity define him as a dynamic problem solver in the ever-evolving tech landscape.