D-Lab & Graduate Division create inclusive data science summer program

August 9, 2023

D-Lab & Graduate Division create inclusive data science summer program

Robin López has always been an avid learner. So it’s no surprise that he is now a doctoral candidate in UC Berkeley’s Environmental Science, Policy, and Management Program. When Robin was five years old, riding in the car with his parents across the Richmond bridge, he turned to his parents and asked them how it was possible to drive across the water. His mom didn’t know the answer and told him he needed to ask an engineer. Robin’s mom still recounts that story when she looks back at her son’s trajectory. Born and raised in Richmond, California, Robin had a circuitous path to academia. He attended community college and transferred to a 4-year program. After interning at the Lawrence Berkeley National Laboratory, he decided to pursue a PhD in Environmental Science, Policy, and Management (ESPM) where he focuses on ecological integrity and how people interact with their environments. He currently serves as a city council member in Albany.  Still, López often finds he’s one of the only Latinos in data science spaces, making him feel like an outsider.

The lack of representation of people of color, women, people with disabilities, and people who identify as queer or queer-gendered in the field of data science can make members of these communities, like López, feel like they don’t belong. Berkeley’s Social Sciences D-Lab created the Data Science for Social Justice Program with the guidance and generous support of the Graduate Division to address this issue. 

The goal is to diversify data science by teaching diverse students how to use these tools through critical analysis and with the aim to address injustices found in their communities. 

The 8-week summer program invites a selected cohort of students from various campuses within the University of California to participate for free. Additionally, students at UC Berkeley receive a stipend of $3,000. The program enables them to delve into Python programming, explore Natural Language Processing, and critically analyze the societal implications of their work.

According to Dr. Claudia von Vacano, the Executive Director of the Social Sciences D-Lab and a program founder, “...data is not an impartial entity but rather a socially constructed phenomenon. It inherently carries biases, and there is a risk of further introducing bias.” She emphasized the need for critical analysis and a deliberate approach to constructing and using data. Additionally, Dr. von Vacano highlighted the importance of considering the consequences of research, such as considering who the beneficiaries are, and contributing back to historically underserved communities.

“Value-informed practices for data science’’

In the Data Science for Social Justice Workshop, students are encouraged to consider their positionality and how their research impacts the communities they are studying.

“The most important part of this, to me, is that 30 students are going to come out of this, with what I hope are value-informed practices for data science,” said Dr. Pratik Sachdeva, Senior Instructor in the social justice program and Senior Data Scientist at the Social Sciences D-Lab.

STEM disciplines sometimes don’t give students the foundation to use their expertise to address social justice issues. This program fills that gap through an intensive, experiential-learning based curriculum focused on applying natural language processing techniques with Python on social media data. At the same time, students conduct critical readings and discussion sessions on issues of fairness, accountability, and transparency in machine learning and data science. 

The program is already having a positive impact on the students through their interactions with students from other departments. They are beginning to see themselves as data scientists and others are, too.

Consider López, sometimes he holds his 10-month-old son while attending class virtually. His son appears mesmerized by the code moving on the screen. To Lopez, reflecting on these moments learning with his son makes him appreciate the opportunities to learn data science skills.

“The value of this workshop is that it’s giving us the tools to be able to transfer knowledge to other peers and in my case, to future generations almost immediately in real time,” said López. “I owe it to my [son and daughter] to take advantage of these opportunities, like this workshop. Because if I don’t, then what purpose am I serving myself and them to be at UC Berkeley?”

The program would not have been possible without the financial support of the Graduate Division under the leadership of Lisa García Bedolla,  Berkeley's Vice Provost for Graduate Studies and Dean of the Graduate Division, and a Professor in the School of Education and Denzil Streete, Assistant Vice Provost for Graduate Studies & Chief of Staff at University of California, Berkeley. Dr. Streete also connected the program to OpenAI, the creators of ChatGPT, and the students of the program are part of an effort to democratize the development of that and related tools. Kara Ganter, Director of Digital Education at the Graduate Division, was the program manager and designer. Additionally, Senior Data Scientist Dr. Pratik Sachdeva, was the lead instructor, other instructors include Renata Barreto, Emily Grabowski, and Dr. Tom van Nuenen, who provided the blueprint for the program.