Data Science

SQL Database Fundamentals for Data Analysis

December 9, 2024, 10:00am
This workshop introduces the fundamentals of SQL, with a focus on using SQLite (the most ubiquitous database on the planet) for data science tasks. We'll explore how SQL can be used to query and manipulate relational databases. This hands-on workshop includes exercises based on real-world datasets.

Language Models in Mental Health Conversations – How Empathetic Are They Really?

December 3, 2024
by Sohail Khan. Language models are becoming integral to daily life as trusted sources of advice. While their utility has expanded from simple tasks like text summarization to more complex interactions, the empathetic quality of their responses is crucial. This article explores methods to assess the emotional appropriateness of these models, using metrics such as BLEU, ROUGE, and Sentence Transformers. By analyzing models like LLaMA in mental health dialogues, we learn that while they suffer through traditional word-based metrics, LLaMA's performance in capturing empathy through semantic similarity is promising. In addition, we must advocate for continuous monitoring to ensure these models support their users' mental well-being effectively.

GitHub is Not Just for Coding: The Powerful Task Management Tool in Your Back Pocket

November 26, 2024
by Elena Stacy. This article introduces the use of GitHub as a task management tool for researchers in any field – even if your project doesn’t involve coding. GitHub is a free tool that many researchers already use in some capacity, and can be easily adapted specifically to task management to enable transparent project collaboration and documentation. We walk through the advantages of using GitHub for this purpose, and provide a comprehensive tutorial on how to get up and running with GitHub as a task management tool for your own projects.

A Recipe for Reliable Discoveries: Ensuring Stability Throughout Your Data Work

November 19, 2024
by Jaewon Saw. Imagine perfecting a favorite recipe, then sharing it with others, only to find their results differ because of small changes in tools or ingredients. How do you ensure the dish still reflects your original vision? This challenge captures the principle of stability in data science: achieving acceptable consistency in outcomes relative to reasonable perturbations of conditions and methods. In this blog post, I reflect on my research journey and share why grounding data work in stability is essential for reproducibility, adaptability, and trust in the final results.

Python Data Processing Basics for Acoustic Analysis

November 12, 2024
by Amber Galvano. Interested in learning how to merge data and metadata from multiple sources into a consolidated dataset? Dealing with annotated audio and want to automate your workflow? Tried Praat scripting but want something more streamlined? This blog post will walk through some key domain-specific Python-based tools you will need in order to take your audio data, annotations, and speaker metadata and come away with a tabular dataset containing acoustic measures, ready to visualize and submit to statistical analysis. This tutorial uses acoustic phonetics data, but can be adapted to a range of projects involving repeated measures data and/or work with audio files.

LLMs for Exploratory Research

December 10, 2024, 1:00pm
In a fast evolving artificial intelligence landscape, LLMs such as GPT have become a common buzzword. In the research community, their advantages and pitfalls are hotly debated. In this workshop, we will explore different chatbots powered by LLMs, beyond just ChatGPT. Our main goal will be to understand how LLMs can be used by researchers to conduct early-stage (or exploratory) research. Throughout the workshop, we will discuss best practices for prompt engineering and heuristics to evaluate the suitability of an LLM's output for our research purposes. Though the workshop primarily focuses on early-stage research, we will briefly discuss the use cases of LLMs in later stages of research, such as data analysis and writing.

Command Line Fundamentals

December 10, 2024, 10:00am
In this workshop, we provide a basic introduction to how to interact with your computer via terminal. We are going to focus on Bash (Bourne-Again Shell) or Zsh (Z Shell), which are one of the most commonly used Unix/Linux shells.

Python Fundamentals: Parts 1-3

December 9, 2024, 2:00pm
This three-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Exploring Rental Affordability in the San Francisco Bay Area Neighborhoods with R

November 5, 2024
by Taesoo Song. Many American cities continue to face severe rental burdens. However, we rarely examine rental affordability through the lens of quantitative data. In this blog post, I demonstrate how to download and visualize rental affordability data for the San Francisco Bay Area using R packages like `tidycensus` and `sf`. This exercise shows that mapping census data can be a straightforward and powerful way to understand the spatial patterns of housing dynamics and can offer valuable insights for research, policy, and advocacy.

R Fundamentals: Parts 1-4

December 9, 2024, 9:00am
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.