Research Project

Tom van Nuenen, Ph.D.

Data/Research Scientist, Senior Consultant, and Senior Instructor

D-Lab

Social Sciences

Digital Humanities

I work as a Lecturer, Data Scientist, and Senior Consultant at UC Berkeley's D-Lab. I lead the curriculum design for D-Lab’s data science workshop portfolio, as well as the Digital Humanities Summer Program at Berkeley.

Former research projects include a Research Associate position in the ‘Discovering and Attesting Digital Discrimination’ project at King’s College London (2019-2022) and a researcher-in-residence role for the UK’s National Research Centre on Privacy, Harm Reduction, and Adversarial Influence Online (2022). My research uses Natural Language Processing methods to...

Read more about Tom van Nuenen, Ph.D.

Institutional Review Board (IRB) Fundamentals

February 18, 2025, 9:00am

Are you starting a research project at UC Berkeley that involves human subjects? If so, one of the first steps you will need to take is getting IRB approval.

Read more about Institutional Review Board (IRB) Fundamentals

What are Time Series Made of?

December 10, 2024

Bruno Smaniotto

by Bruno Smaniotto. Trend-cycle decompositions are statistical tools that help us understand the different components of Time Series – Trend, Cycle, Seasonal, and Error. In this blog post, we will provide an introduction to these methods, focusing on the intuition behind the definition of the different components, providing real-life examples and discussing applications.

Read more about What are Time Series Made of?

Language Models in Mental Health Conversations – How Empathetic Are They Really?

December 3, 2024

Sohail Khan

by Sohail Khan. Language models are becoming integral to daily life as trusted sources of advice. While their utility has expanded from simple tasks like text summarization to more complex interactions, the empathetic quality of their responses is crucial. This article explores methods to assess the emotional appropriateness of these models, using metrics such as BLEU, ROUGE, and Sentence Transformers. By analyzing models like LLaMA in mental health dialogues, we learn that while they suffer through traditional word-based metrics, LLaMA's performance in capturing empathy through semantic similarity is promising. In addition, we must advocate for continuous monitoring to ensure these models support their users' mental well-being effectively.

Read more about Language Models in Mental Health Conversations – How Empathetic Are They Really?

GitHub is Not Just for Coding: The Powerful Task Management Tool in Your Back Pocket

November 26, 2024

Elena Stacy

by Elena Stacy. This article introduces the use of GitHub as a task management tool for researchers in any field – even if your project doesn’t involve coding. GitHub is a free tool that many researchers already use in some capacity, and can be easily adapted specifically to task management to enable transparent project collaboration and documentation. We walk through the advantages of using GitHub for this purpose, and provide a comprehensive tutorial on how to get up and running with GitHub as a task management tool for your own projects.

Read more about GitHub is Not Just for Coding: The Powerful Task Management Tool in Your Back Pocket

A Recipe for Reliable Discoveries: Ensuring Stability Throughout Your Data Work

November 19, 2024

Jaewon Saw

by Jaewon Saw. Imagine perfecting a favorite recipe, then sharing it with others, only to find their results differ because of small changes in tools or ingredients. How do you ensure the dish still reflects your original vision? This challenge captures the principle of stability in data science: achieving acceptable consistency in outcomes relative to reasonable perturbations of conditions and methods. In this blog post, I reflect on my research journey and share why grounding data work in stability is essential for reproducibility, adaptability, and trust in the final results.

Read more about A Recipe for Reliable Discoveries: Ensuring Stability Throughout Your Data Work

Python Data Processing Basics for Acoustic Analysis

November 12, 2024

Amber Galvano

by Amber Galvano. Interested in learning how to merge data and metadata from multiple sources into a consolidated dataset? Dealing with annotated audio and want to automate your workflow? Tried Praat scripting but want something more streamlined? This blog post will walk through some key domain-specific Python-based tools you will need in order to take your audio data, annotations, and speaker metadata and come away with a tabular dataset containing acoustic measures, ready to visualize and submit to statistical analysis. This tutorial uses acoustic phonetics data, but can be adapted to a range of projects involving repeated measures data and/or work with audio files.

Read more about Python Data Processing Basics for Acoustic Analysis

Exploring Rental Affordability in the San Francisco Bay Area Neighborhoods with R

November 5, 2024

Taesoo Song

by Taesoo Song. Many American cities continue to face severe rental burdens. However, we rarely examine rental affordability through the lens of quantitative data. In this blog post, I demonstrate how to download and visualize rental affordability data for the San Francisco Bay Area using R packages like `tidycensus` and `sf`. This exercise shows that mapping census data can be a straightforward and powerful way to understand the spatial patterns of housing dynamics and can offer valuable insights for research, policy, and advocacy.

Read more about Exploring Rental Affordability in the San Francisco Bay Area Neighborhoods with R

Human-Centered Design for Migrant Rights

October 29, 2024

Victoria Hollingshead

by Victoria Hollingshead. In honor of the 2024 International Day of Care and Support, Victoria Hollingshead shares her recent work with the Center for Migrant Advocacy’s Direct Assistance Program and their innovative approach to supporting Overseas Filipino Workers (OFWs) using generative AI. OFWs, especially female domestic workers in the Gulf Cooperation Council (GCC), are vulnerable to exploitation from foreign employers and recruitment agencies while having limited access to legal support. Using a design thinking framework, Victoria and CMA’s Direct Assistance team co-designed a proof of concept to enhance the legal and contract literacy among OFWs in the Kingdom of Saudi Arabia, a top destination country. This project shows promise in leveraging emerging technologies to empower OFWs, enhancing the Philippines' reputation as a migrant champion and supporting the nation's broader push for digital transformation.

Read more about Human-Centered Design for Migrant Rights

Concepts and Measurements in Social Network Analysis

October 22, 2024

Christian Caballero

by Christian Caballero. We live in an interconnected world, more so now than ever. Social Network Analysis (SNA) provides a toolkit to study the influence of this interconnectivity. This blog post introduces some key theoretical concepts behind SNA, as well as a family of metrics for measuring influence in a network, known as centrality. These concepts and measurements help form the basis for a theoretically informed study of social relationships in an era where the availability of relational data has dramatically increased thanks to technological advances.

Read more about Concepts and Measurements in Social Network Analysis

« first View: Taxonomy term
‹ previous View: Taxonomy term
1 of 5 View: Taxonomy term
2 of 5 View: Taxonomy term (Current page)
3 of 5 View: Taxonomy term
4 of 5 View: Taxonomy term
5 of 5 View: Taxonomy term
next › View: Taxonomy term
last » View: Taxonomy term