A Recipe for Reliable Discoveries: Ensuring Stability Throughout Your Data Work
Let’s say you’ve crafted a favorite recipe over time. You’ve noted down how much seasoning to add, the timing for each step, and the best order to layer the flavors. It always turns out perfectly when you make it. Now, imagine sharing this recipe with others, so they can recreate it just as well. But here’s the challenge: how do you ensure it works for them, even if they use slightly different tools or ingredients? If a friend makes small (and seemingly reasonable) adjustments, like choosing a different type of salt, a shorter simmer time, or a different flame intensity, will the dish still capture the essence of what you intended?
This question gets to the heart of stability: acceptable consistency in outcomes relative to reasonable perturbations of conditions and methods. It’s one thing to develop a method that works well in your specific setup, but it’s another to ensure it holds up across different setups and slight variations. Stability in data science is about making sure results remain consistent and reproducible – necessary conditions for effective quality control — just like a well-crafted recipe that can be followed by any cook, anywhere, with reliable results. This focus on stability became especially important to me when I started working with Distributed Acoustic Sensing (DAS), where I found reproducibility and adaptability to be both challenging and deeply meaningful.
Exploring DAS: Building Reproducible Results with a Novel Technology
I was introduced to Distributed Acoustic Sensing (DAS) when I began my PhD a few years ago. It stood out to me as a relatively new but powerful technology: DAS can take an ordinary fiber optic cable – a long, flexible glass wire used for telecommunications, such as internet signals – and turn it into a series of densely spaced vibration sensors. DAS techniques allow us to “listen” to vibrations along the entire length of the cable, detecting things like footsteps, vehicle movement, and earthquake activity, with very high sensitivity (Lindsey and Martin, 2021). Even more exciting was the possibility of using DAS with existing telecommunications cables – the so-called “dark fiber networks” – that are already installed but not actively used. The ability to tap into this preexisting infrastructure sounded like an engineer’s dream.
I’ve been fortunate to explore DAS applications in various contexts, such as monitoring traffic events on an instrumented roadway, listening to the ocean soundscape aboard a research vessel in Monterey Bay, and monitoring the structural health of a wind turbine tower on a shake table. However, I realized there wasn’t yet a standardized framework for working with DAS data. Much of the literature focused on proof-of-concept studies, but there was little guidance on methodologies that could generalize beyond specific setups. I had to figure out my own processes, grappling with questions like:
- How can I ensure that my choices during data cleaning and preprocessing don’t unintentionally impose bias or strip away essential information contained in the raw data?
- When evaluating data collected from an experiment of my own design, how can I assess how well it applies to real-world conditions?
- What empirical evidence should I consider if I’m making different data choices for different experiments?
(Top) Monitoring traffic activities on instrumented roadway; (Middle) listening for whale calls in Monterey Bay, California (source: Jeremy Snyder); (Bottom) testing wind turbine tower under different loading conditions. What empirical evidence should I consider if I’m making different data choices for different experiments?
I wanted to ensure that the methods I developed would not only work for my own projects but also for others who might approach similar problems. Just as importantly, I also wanted to trust my own decisions throughout the DAS data life cycle – spanning experimental design, sensor deployment, data collection, processing, analysis, visualization, and interpretation – while being able to critically assess and revise the limitations of my work as I continued to gather new data and insights. A DAS data pipeline not grounded in stability could encourage incorrect interpretations or mischaracterizations of the signals, leading to an inaccurate understanding of the capabilities and limitations of DAS technology. Ultimately, the lack of stability would undermine confidence in the results, making it difficult for others to build upon the existing work and to leverage the technology to its full potential.
Stability for “Veridical Data Science”
It was around this time that I took Bin Yu’s class (STAT 215A, Applied Statistics and Machine Learning), which discussed the Predictability, Computability, and Stability (PCS) framework (Yu and Barter, 2024; Yu, 2020). I was especially drawn to the stability principle, which addressed what I felt uncertain about my work. Reading and discussing about stability in the context of “veridical data science” felt almost cathartic because it offered structure and language for what I aimed to achieve in my research: how to “assess how human judgment calls impact data results through data and model/algorithm perturbations”, and the importance of addressing “whether another researcher making alternative, appropriate decisions would obtain similar conclusions.”:
-
“The ultimate goal of the data science lifecycle (DSLC) is to generate knowledge that is useful for future actions.”
-
“Stability relative to question or problem formulation implies that the domain conclusions are qualitatively consistent across these different translations.”
-
“The validity of an analysis relies on implicit stability assumptions that allow data to be treated as an informative representation of some natural phenomena. When these assumptions do not hold, conclusions rarely generalize to new settings unless empirically proved by future data.”
-
“To answer a domain question (...) collect data based on prior knowledge and available resources. When these data are used to guide future decisions, researchers implicitly assume that the data are relevant to a future time. In other words, they assume that conditions affecting data collection are stable, at least relative to some aspects of the data.”
In Pursuit of Stability in My Work
To integrate stability into my own DAS research, I began to examine more systematically each step of my data pipeline. Drawing on recommendations in Yu and Barter (2024), I kept close track of my workflow and maintained documentation to clearly outline how I conducted my data projects from start to finish. This practice not only guided my own efforts but helped ensure transparency for my collaborators as well as others who might build on my work. Stability was a way to ensure that small variations or judgment calls in data handling would not lead to disproportionately large or misleading changes in the final results. This meant asking critical questions about each step of the process, including:
-
Cable deployment conditions: How does the cable’s deployment – how it’s placed and secured in a specific location – affect its sensitivity to signals and its susceptibility to noise? When the sensing cable is directly attached to the structure of interest (like a road surface or a turbine tower) rather than loosely contained in a conduit or trench, signal characteristics can vary significantly.
-
Data acquisition settings: Should I approach the data collection differently depending on the source and nature of the signals? For example, active sources, like controlled tests without significant external noise sources, would require different acquisition settings compared to passive sources, such as ambient traffic noise or ocean soundscapes.
-
Defining targeted signals: How does predefining the signal characteristics (e.g., frequency band of interest) shape my data visualization and interpretation later on? How do I determine what is noise and what isn’t?
In the end, achieving complete stability and reproducibility across every aspect of DAS research, or any complex data project, is rarely possible. Just as it’s unrealistic to expect every cook to have the exact same ingredients or tools for a recipe, it’s equally challenging to account for everything in a data experiment. There will always be limitations in data quality, unexpected environmental factors, or inherent assumptions that may affect the generalizability of the results. At some point, one must acknowledge these constraints, accept the imperfections, and move forward.
Yet there are practical ways to strengthen reliability – like maintaining thorough documentation, sharing code and models openly, and recording decisions made throughout the data life cycle (Yu, 2020). And just as important is a commitment to a careful, intentional, and reflective approach that I strive to bring to my own work now and in the future. Like with a beloved recipe, I may never have the “ideal kitchen” or the complete understanding of what I’m working with, but by focusing on what matters (as I see with what limited knowledge I have), I aim to build research that I can confidently say is as reliable and trustworthy as I could make it.
And that reminds me, I really need to figure out why my kimchi never tastes the same as my mom’s… it must be the salt, right?
References
-
Lindsey, Nathaniel J., and Eileen R. Martin. "Fiber-optic seismology." Annual Review of Earth and Planetary Sciences 49, no. 1 (2021): 309-336.
-
Yu, Bin, and Rebecca L. Barter. Veridical data science: The practice of responsible data analysis and decision making. MIT Press, 2024.
-
Yu, Bin. "Veridical data science." In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 4-5. 2020.