Blog post

Digitizing Inclusion: FinTech’s Promise and Pitfalls in the Global South

April 22, 2025
by Victoria Hollingshead. FinTech promises to revolutionize financial inclusion by harnessing data science to reach populations historically excluded from formal financial systems. By analyzing digital footprints, mobile payments, and behavioral data, startups and financial institutions have the potential to improve customer-lender interactions and revolutionize screening and monitoring techniques. While FinTech shows promise of enabling financial access, it also raises critical questions: how do we implement financial inclusion without reproducing the structures of the past? And more rhetorically, can financiers be the arbiters of financial inclusion, if their intrinsic role is to stratify and exclude?

Sharing Just Enough: The Magic Behind Gaining Privacy while Preserving Utility

April 15, 2025
by Sohail Khan. Netflix knows what you like, but does it need to know your politics too? We often face a frustrating choice: share our data and be tracked, or protect our privacy and lose personalization. But what if there was a third option? This article begins by introducing the concept of the privacy-utility trade-off, then explores the methods behind strategic data distortion, a technique that lets you subtly tweak your data to block sensitive inferences (like political views) while still maintaining useful recommendations. Finally, it looks ahead and advocates for a future where users, not platforms, shape the rules, reclaiming control of their own privacy.

The Creation of Bad Students: AI Detection for Non-Native English Speakers

January 21, 2025
by Valeria Ramírez Castañeda. This blog explores how AI detection tools in academia perpetuate surveillance and punishment, disproportionately penalizing non-native English speakers (NNES). It critiques the rigid, culturally biased notions of originality and intellectual property, highlighting how NNES rely on AI to navigate the dominance of English in academic settings. Current educational practices often label AI use as dishonest, ignoring its potential to reduce global inequities. The post argues for a shift from punitive measures to integrate AIs as a tool for inclusivity, fostering diverse perspectives. By embracing AI, academia can prioritize collaboration and creativity over control and discipline.

Fritz_X_DargesBlue42… Who Are You?

January 14, 2025
by Jonathan Pérez. Reflecting on the complexities of the human experience is paramount to conducting research. Jonathan Pérez, through his exploration of a conspiracy subreddit, reflects on his experience trying to find the human behind the datum. Jonathan critiques the harmful effects of dehumanizing rhetoric and the researcher’s responsibility to navigate ethical implications. In doing so, he establishes three guiding rules to support researchers seeking to humanize their analysis: 1) a researcher must always find the story behind the data; 2) a researcher must protect themselves; 3) a researcher must still humanize participants (even those who perpetuate harmful narratives).

What are Time Series Made of?

December 10, 2024
by Bruno Smaniotto. Trend-cycle decompositions are statistical tools that help us understand the different components of Time Series – Trend, Cycle, Seasonal, and Error. In this blog post, we will provide an introduction to these methods, focusing on the intuition behind the definition of the different components, providing real-life examples and discussing applications.

Language Models in Mental Health Conversations – How Empathetic Are They Really?

December 3, 2024
by Sohail Khan. Language models are becoming integral to daily life as trusted sources of advice. While their utility has expanded from simple tasks like text summarization to more complex interactions, the empathetic quality of their responses is crucial. This article explores methods to assess the emotional appropriateness of these models, using metrics such as BLEU, ROUGE, and Sentence Transformers. By analyzing models like LLaMA in mental health dialogues, we learn that while they suffer through traditional word-based metrics, LLaMA's performance in capturing empathy through semantic similarity is promising. In addition, we must advocate for continuous monitoring to ensure these models support their users' mental well-being effectively.

GitHub is Not Just for Coding: The Powerful Task Management Tool in Your Back Pocket

November 26, 2024
by Elena Stacy. This article introduces the use of GitHub as a task management tool for researchers in any field – even if your project doesn’t involve coding. GitHub is a free tool that many researchers already use in some capacity, and can be easily adapted specifically to task management to enable transparent project collaboration and documentation. We walk through the advantages of using GitHub for this purpose, and provide a comprehensive tutorial on how to get up and running with GitHub as a task management tool for your own projects.

A Recipe for Reliable Discoveries: Ensuring Stability Throughout Your Data Work

November 19, 2024
by Jaewon Saw. Imagine perfecting a favorite recipe, then sharing it with others, only to find their results differ because of small changes in tools or ingredients. How do you ensure the dish still reflects your original vision? This challenge captures the principle of stability in data science: achieving acceptable consistency in outcomes relative to reasonable perturbations of conditions and methods. In this blog post, I reflect on my research journey and share why grounding data work in stability is essential for reproducibility, adaptability, and trust in the final results.

Python Data Processing Basics for Acoustic Analysis

November 12, 2024
by Amber Galvano. Interested in learning how to merge data and metadata from multiple sources into a consolidated dataset? Dealing with annotated audio and want to automate your workflow? Tried Praat scripting but want something more streamlined? This blog post will walk through some key domain-specific Python-based tools you will need in order to take your audio data, annotations, and speaker metadata and come away with a tabular dataset containing acoustic measures, ready to visualize and submit to statistical analysis. This tutorial uses acoustic phonetics data, but can be adapted to a range of projects involving repeated measures data and/or work with audio files.

Exploring Rental Affordability in the San Francisco Bay Area Neighborhoods with R

November 5, 2024
by Taesoo Song. Many American cities continue to face severe rental burdens. However, we rarely examine rental affordability through the lens of quantitative data. In this blog post, I demonstrate how to download and visualize rental affordability data for the San Francisco Bay Area using R packages like `tidycensus` and `sf`. This exercise shows that mapping census data can be a straightforward and powerful way to understand the spatial patterns of housing dynamics and can offer valuable insights for research, policy, and advocacy.