Python

Searching for Other Solar Systems

November 21, 2023
by Emma Turtelboom. Over the last three decades, we have discovered over 5000 exoplanets, which are planets outside of our Solar System. With these observations, we can try to answer many questions we have about the universe. For example, how unique is the Solar System? How do planets form? Is there life elsewhere in the Milky Way? We can query the NASA Exoplanet Archive to compare multi-planet systems to the Solar System. Through this, we can compare how similar (or dissimilar!) the systems are.

Hate Speech

The hate speech measurement project began in early 2017 at UC Berkeley’s D-Lab. Our research project applies data science techniques such as machine learning to track changes in hate speech over time and across social media platforms. After three years, we have now published our groundbreaking method that measures hate speech with precision while mitigating the influence of human bias. Read the manuscript here.

María Martín López

Data Science Fellow
Psychology

María Martín López is a PhD student in the Cognition area within the Department of Psychology. Her research relates to cognitive computational and quantitative models of individual differences in behaviors, thoughts, and emotions. She is particularly interested in how we can create and leverage novel algorithms to understand, measure, and predict processes relating to externalizing psychopathology (e.g. impulsivity, aggression, substance use). She answers these questions using a range of computational and quantitive models including AI, NLP, SEM, time series analysis, multi-level...

Twitter Text Analysis: A Friendly Introduction, Part 2

March 7, 2023
by Mingyu Yuan. This blog post is the second part of “Twitter Text Analysis”. The goal is to use language models such as BERT to build a classifier on tweets. Word embedding, training and test splitting, model implementation, and model evaluation are introduced in this model.

Can Machine Learning Models Predict Reality TV Winners? The Case of Survivor

March 14, 2023
by Kelly Quinn. Reality television shows are notorious for tipping the scales to favor certain players they want to see win, but could producers also be spoiling the results in the process? Drawing on data about Survivor, I attempt to predict the likelihood of a contestant making it far into the game based on editing and production decisions, as well as demographic information. This post describes the model used to classify player outcomes and other potential ways to leverage data about reality TV shows for prediction.

Acquiring Genomic Data from NCBI

April 4, 2023
by Monica Donegan. Genomic data is essential for studying evolutionary biology, human health, and epidemiology. Public agencies, such as the National Center for Biotechnology Information (NCBI) offer excellent resources and access to vast quantities of genomic data. This blog introduces a brief workflow to download genomic data from public databases.

Mapping Time-Series Satellite Images with Google Earth Engine API

July 17, 2023
by Meiqing Li. Remote sensing imagery has the potential to reveal land use patterns and human activities at a planetary scale. For example, nighttime light intensity extracted from can shed light on spatial patterns of human activities and settlements, especially in places where traditional data are scarce. This blog post introduces Google Earth Engine (GEE) as a general purpose tool to extract time-series remote sensing data from GEE data catalog. I walk through using GEE to obtain data, filter by time and geographic region, and visualize it on static and interactive maps.

D-Lab & Graduate Division create inclusive data science summer program

August 9, 2023
by Vanessa Navarro Rodriguez. UC Berkeley's Social Sciences D-Lab and Graduate Division created the Data Science for Social Justice Program to address underrepresentation in data science. The program teaches diverse students critical data analysis and its applications in addressing societal injustices. The 8-week free summer course for admitted University of California students focuses on Python programming, Natural Language Processing, and value-informed data practices. It aims to empower students from underrepresented backgrounds and to bridge STEM with social justice. This blog post elaborates on the program's creation and features one of the DSSJ students, Robin López, and his reasons for participating.

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

September 5, 2023
by Genevieve Smith. This summer I joined the D-Lab hosted Data Science for Social Justice workshop at UC Berkeley diving into Python – including TF-IDF, sentiment analysis, word embeddings, and more – with a lens towards leveraging data science for social justice. My team explored a Reddit channel on abortion and used computational analysis to answer key questions related to abortion access from before versus after Roe vs. Wade was overturned. Computational social science is incredibly powerful, but I continue to grapple with tensions particularly as it relates to employing machine learning and large language in international research, and end with key recommendations for CSS practitioners.

James Hall

Consultant
Department of Statistics

James Hall is a graduate student in the Statistics MA program at University of California, Berkeley. He is a husband and father to three awesome kids. Originally from Baltimore, MD, James earned his bachelors in Mathematics at the United States Military Academy at West Point, NY in 2011, and served as a U.S. Army officer. He’s served as a leader at multiple levels within large organizations with a professional focus on visualizing and communicating complex analysis to decision makers. James’ experience and coursework give him expertise in navigating different statistical methods,...