Data Sources

Addison Pickrell

IUSE Undergraduate Advisory Board
Mathematics
Sociology

Addison is an aspiring mathematician and social scientist (Class of '27). He loves collecting books he'll never read, is an open-source and open-access advocate, and an aspiring community organizer and systems disrupter. Ask me about community-based participatory action research (CBPAR), critical pedagogy, applied mathematics, and social science.

Exploratory Data Analysis in Social Science Research

November 14, 2023
by Kamya Yadav. Causal inference has become the dominant endeavor for many political scientists, often at the expense of good research questions and theory building. Returning to descriptive inference – the process of describing the world as it exists – can help formulate research questions worth asking and theory that is grounded in reality. Exploratory data analysis is one method of conducting descriptive inference. It can help social science researchers find empirical patterns and puzzles that motivate their research questions, test correlations between variables, and engage with the existing literature on a topic. In this blog post, I walk through results from exploratory data analysis I conducted for my dissertation project on political ambition of women.

Mapping Census Data with tidycensus

November 6, 2023
by Alex Ramiller. The U.S. Census Bureau provides a rich source of publicly available data for a wide variety of research applications. However, the traditional process of downloading these data from the census website is slow, cumbersome, and inefficient. The R package “tidycensus” provides researchers with a tool to overcome these challenges, enabling a streamlined process to quickly downloading numerous datasets directly from the census API (Application Programming Interface). This blog post provides a basic workflow for the use of the tidycensus package, from installing the package and identifying variables to efficiently downloading and mapping census data.

Americanist Linguistics: on Ethics and Intent

October 17, 2023
by Anna Björklund. In this post, Anna Björklund investigates the origin of the linguistic study of indigenous American languages, its inextricable ties to settler-colonialism, and how linguistics can move forward as a field.

FSRDC 2023 Annual Meeting and Research Conference

October 2, 2023
by Renee Starowicz. Renee Starowicz, Co-Executive Director of the Berkeley Federal Statistical Research Data Center, provides an overview of the takeaways from the 2023 Annual Federal Statistical Research Data Center Business Meeting and Annual Conference. She provides a brief overview of the Berkeley FSRDC. Then, she describes the priorities for collaboration across national directors to improve outreach to diverse researchers and transparency. Additionally, she points out the other key topics of conversation at this year’s meeting.

RETHINKING DATA SCIENCE PEDAGOGY WITH EMBEDDED ETHICAL CONSIDERATIONS

Vandana Janeja
Maria Sanchez
2022

The focus of this paper is to present a tool to meet the need of developing ethical critical thinking in data science curriculum for undergraduate students. New data science methods impact societies, communities directly or indirectly when dealing with open and other real-world datasets. In particular, for data science there is a need to develop ethical critical thinking while analyzing the data. Throughout the entire lifecycle of the data in the knowledge discovery process there are many opportunities for ethical decision making that a data scientist can evaluate to do no harm. To...

Acquiring Genomic Data from NCBI

April 4, 2023
by Monica Donegan. Genomic data is essential for studying evolutionary biology, human health, and epidemiology. Public agencies, such as the National Center for Biotechnology Information (NCBI) offer excellent resources and access to vast quantities of genomic data. This blog introduces a brief workflow to download genomic data from public databases.

Why We Need Digital Hermeneutics

July 13, 2023
by Tom van Nuenen. Tom van Nuenen discusses the sixth iteration of his course named Digital Hermeneutics at Berkeley. The class teaches the practices of data science and text analysis in the context of hermeneutics, the study of interpretation. In the course, students analyze texts from Reddit communities, focusing on how these communities make sense of the world. This task combines both close and distant readings of texts, as students employ computational tools to find broader patterns and themes. The article reflects on the rise of AI language models like ChatGPT, and how these machines interpret human interpretations. The popularity and profitability of language models presents an issue for the future of open research, due to the monetization of social media data.

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

September 5, 2023
by Genevieve Smith. This summer I joined the D-Lab hosted Data Science for Social Justice workshop at UC Berkeley diving into Python – including TF-IDF, sentiment analysis, word embeddings, and more – with a lens towards leveraging data science for social justice. My team explored a Reddit channel on abortion and used computational analysis to answer key questions related to abortion access from before versus after Roe vs. Wade was overturned. Computational social science is incredibly powerful, but I continue to grapple with tensions particularly as it relates to employing machine learning and large language in international research, and end with key recommendations for CSS practitioners.

Artificial Intelligence (AI) Systems, the Poor, and Consent: A Feminist Anti-Colonial Lens to Digitalized Surveillance

September 18, 2023
By Alejandro Nuñez. Today’s digital age has created a sea of endless datafication where our everyday interactions, actions, and conversations are turned into data. The advancements of automated artificial intelligence (AI) systems, and their infrastructure in which they are created and trained on, have catapulted us into an era of consistent monitoring and surveillance.