Why I Don’t Call Myself a Data Scientist: A Researcher's Journey
Here is the truth: I’ve never really felt comfortable calling myself a data scientist. While I could classify myself as one, I hold an undergraduate degree in computer science, and most of my professional projects involved data analysis in education policy, I have never felt compelled to identify as one. I have always felt uneasy with many of the flashy, but inhumane data science applications, such as using algorithms to decide if students should graduate high school, or their growing use in surveilling marginalized and mixed-status families. Do I want to be just another scientist who simply extracts data from people? How can I use these tools and systems to create real change?
Why the Title Doesn’t Fit
My aversion to being labeled as someone in STEM or even gaining technical skills came from not seeing myself in that space. As a first-generation, queer, Latinx son of immigrants, I have always felt out of place in spaces that claim to define what science is. I've been made to feel “at risk” of not graduating with a CS degree or “not enough” to secure a return offer at my tech internship.
To combat this isolation, I majored in political science and Latino studies in addition to Computer Science, seeking spaces where my identity was valued and where I thought I could make a meaningful impact. However, I realized that the worlds of computational and social science did not align. I was mocked as too scientific in my Mexican American Political Thought class and too concerned about human narratives in my Contemporary Issues in Computer Science course.
Data science was presented to me as a way to predict the future, and I was eager to help adapt its use cases to find discrepancies in education or explore how it could be used to better support marginalized communities. However, I faced the same challenge: my perspective was too closely linked to my identity, and once again, I felt uncertain about where my skills stood and what they could be used for.
Is this what counts as “Data Science”?
The first time I had a visceral reaction to a K-12 education data science paper was when they combined police killing data and where students live in Los Angeles to understand the impact of violence on student well-being and academic success. I remember asking my class, “What's the point of this paper?” If you ask any K-12 teacher, parent, or administrator, they will give you the same findings as the paper. The violence leads to traumatic effects on students' well-being, causing them to miss more school and struggle academically.
Yet, what made me mad was that as a former high school teacher based in LA, I had to help my students navigate the real-life impact of violence in their neighborhoods, or how it affects families. To me, the paper screamed, “Hey, look at this shiny new way of using big data.” Did I want to run studies that strip the humanity from individual stories, or just write papers to showcase my new model? I was torn because I had the skill set and computational mindset to be a data scientist, but did I really want to become one?
Criticality between Numbers & Narratives
This constant cycle of questioning and quantification in research prompted me to reflect on my own values regarding knowledge and epistemologies. It prompted me to consider my positionality in research, question the limitations of data science, and undertake that work. This was evident in early conversations with faculty mentors, who told me that my questions did not align with my methods or that the questions I was asking in data science couldn’t be critical. I was lost, debating whether to focus only on qualitative work.
Luckily, I was venting to a great mentor about my research identity and was advised to look into QuantCrit, a “framework that centers the inherent racial bias and false neutrality of quantitative methodologies and tools [in quantitative research] that are often framed as objective and value-free” (Museus, 2023). This introduction validated my concerns; I finally felt I was not alone in feeling this over-reliance on data and quantification.
However, I wondered how these critiques can be applied to data science, a discipline that prioritizes efficiency and prediction over theoretical or ethical concerns. Thankfully, through one of my fellowships, I enrolled in DATA C204, "Human Context and Ethics of Data," a class focused on understanding the social implications and concerns surrounding quantification in research and society. It finally made me see data as a political object and how those politics are amplified through the model and refined through data science. Despite the theoretical importance of the course, we lacked a straightforward guide on how to make our data science studies critical or how to address the embedded politics of datasets. Desperate, and after an extensive online search for critical data or computational scholars, I found the Institute in Critical Quantitative, Computational, & Mixed Methodologies, whose mission is to develop CRITICAL DATA SCIENCE scholars FOR A DIVERSE WORLD. I finally found my scholarly community.
Where can my research fit?
This journey has helped me understand what makes me and my work unique in this field, leading to the development of my research at the intersection of computation and inequality within education policy. Reflecting on my research interests, I have always been engaged in this type of research. I’ve built clustering models to identify California school districts where marginalized students succeed in math. I’ve used NLP to analyze the California Math Framework debate and determine how equity gets framed (or repurposed) in public comments. Currently, I’m trying to scrape financial aid websites to measure the administrative burdens that quietly keep low-income students from accessing aid. In each of these projects, yes, I am a data scientist conducting computational social science work, but I recognize that I’m not striving for pure objectivity in my work, nor do I claim that it will be bias-free. However, as a data scientist, I know: who gets represented? Who gets silenced in the data or model? And how can we use computation to uncover not just patterns in data, but the inequities hidden within it? How was that data made and used in the first place? In the end, my research is as a critical scholar who uses data science, a mission I feel compelled to fulfill.
Closing Reflection & Call to Action
In the end, perhaps I’ll never truly feel comfortable calling myself a “data scientist,” and maybe that’s the point of my journey. My experience as a teacher, holding different disciplines in mind, and my first-generation background, which made me question my worth, have all shaped my philosophy of never viewing numbers and data as neutral. I see my research journeys as an invitation for everyone to imagine new ways of doing data work, ways that find exciting methods to treat numbers as tools for good, approaches that honor lived experience as much as statistical p-value significance, and ways that acknowledge the power and politics embedded in every dataset we create and use.
I call on any researcher working with traditional notions of data to try to reshape the label and discipline itself, to allow our work as data scientists to be messy, situated, and deeply human. And you may not want to call yourself a data scientist.
References
-
Museus, S. D. (2023). An Evolving QuantCrit: The Quantitative Research Complex and a Theory of Racialized Quantitative Systems. In Higher Education: Handbook of Theory and Research (pp. 631–664). Springer, Cham. https://doi.org/10.1007/978-3-031-06696-2_5
