Seeing Behavior in Everyday Data

December 10, 2025

Seeing Behavior in Everyday Data

I chose data science as my second major in undergrad for fairly pragmatic reasons. It seemed practical, people around me said it would be valuable, and honestly, I wasn’t entirely sure what direction I wanted to go professionally. I fit most of my data science courses into my third and fourth years alongside my economics requirements. It was challenging, but turned out to be transformative. At the time, I didn’t see a clear connection between data science and the behavioral questions I cared about. Now, in the middle of a PhD in consumer behavior, that interdisciplinary training has become central to how I see the world.

How Data Science Shapes the Questions I Ask

Many of the projects I’ve found most exciting begin with something simple: noticing a pattern while exploring a dataset. Most consumer environments today are data-rich: consumer reviews, ratings, product descriptions, short videos, and other forms of user- and business-generated content. These environments reflect the actual context consumers operate in, and they leave traces about how people naturally express preferences, make judgments and decisions, or communicate experiences with others in the real world.

Sometimes these hints show up in unexpected ways. For example, while casually exploring a Yelp review dataset, I noticed an unexpected spike in Thursday ratings. I still don’t have a clear explanation for this pattern, but it raises interesting questions: Do weekly routines shape how we evaluate experiences? Does anticipating influence how people make evaluations at the moment? This single exploratory observation led to an inquiry I wouldn’t have thought of in the first place.

Archival data can also help confirm whether experimental insights appear in naturalistic settings. In one project, an archival analysis of 11 million Amazon ratings offered a powerful way to see whether behaviors observed in experimental studies generalize at scale. Consistent with our experimental findings, categories in which consumers tend to keep products longer showed sales patterns more closely aligned with average star ratings, suggesting that people lean more on ratings more when they expect to own the product for a long time. Although correlational, this complementary evidence illustrates how experimental findings can play out in the wild, strengthening the external validity of our findings.

Archival Data and Experimental Design: Complementary Approaches

Large datasets are great for spotting real-world patterns, but they also come with natural limitations. Noise, missing variables, and endogeneity make it difficult to draw strong causal conclusions. But many behavioral researchers are not using these datasets to establish causality or build predictive models. Instead, exploratory patterns can help spark ideas to be tested in controlled experiments. Simple exploratory analysis can reveal intriguing patterns: How do ratings shift over time? How do products differ across categories? How does language vary across reviews? How do summary statistics differ across user groups? These patterns often point to questions we don’t yet fully understand, but merit deeper investigation.    

Experiments then play a complementary role. Whereas archival data reveal what happens in complex, naturalistic environments, experiments allow researchers to isolate mechanisms, rule out alternative explanations, and test causal predictions with precision. A pattern first spotted in a messy dataset can become the basis of a tightly controlled study that probes why the pattern exists.

A more transparent way to combine these approaches is to separate exploration from confirmation. Researchers can identify preliminary patterns in a small exploratory subset, preregister their hypotheses, and then test those hypotheses in the full confirmatory sample and follow-up experiments. This multi-step process leverages the strengths of archival data while preserving the rigor in hypothesis testing.

Following Curiosity Across Methods

Part of what makes behavioral science personally exciting for me is the increasingly diverse methods available to handle behavioral data. Tools from linguistics, computer science, and statistics are expanding what behavioral researchers can observe and ask. For example, colexification networks reveal how languages encode relationships between concepts across different linguistic communities, recent advances in multimodal modeling makes it possible to analyze text, images, and audio to better understand how they jointly shape consumer behavior, cross-cultural studies help assess the generalizability of findings and identify possible cultural differences, and social-network approaches help researchers identify how information or behaviors spread across communities.

Together, these approaches allow researchers to understand behavioral phenomena across very different types of data and context. For me, this expanding methodological landscape is what makes the field feel more open-minded and creative. These emerging tools also help connect researchers with different methodological backgrounds, allowing them to investigate similar substantive questions using diverse data and forms of evidence.

Theory-Driven Research in a Data-Rich Environment

Behavioral science is, at its core, a theory-driven discipline. Strong theory helps clarify the psychological mechanisms we aim to understand, and diverse empirical approaches, whether experimental, archival, linguistic, or computational, offer complementary ways to test and support those theoretical ideas. In that sense, methodological diversity is not separate from theory-building but strengthens it. Berkeley offers opportunities to engage with this kind of integration. The behavioral group at Haas grounds my thinking in rigorous theoretical foundations, while the broader Berkeley community provides opportunities to explore new tools and perspectives. D-Lab workshops, fellowships, and peers have broadened both the methods available to me and the cross-disciplinary discussions I engage in, and courses and certificate programs at the I-School also offer a space to think about behavioral questions through data-driven approaches. Together, these experiences have shaped my belief that the most interesting questions about human behavior emerge at the intersection of rigorous theory and diverse empirical methods. I’m excited to continue building research that bridges these approaches.