Data Science

Qualtrics Fundamentals

February 6, 2025, 5:00pm
Qualtrics is a powerful online tool available to Berkeley community members that can be used for a range of data collection activities. Primarily, Qualtrics is designed to make web surveys easy to write, test, and implement, but the software can be used for data entry, training, quality control, evaluation, market research, pre/post-event feedback, and other uses with some creativity.

Python Fundamentals: Parts 1-3

January 13, 2025, 9:00am
This three-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

R Fundamentals: Parts 1-4

February 3, 2025, 2:00pm
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

R Fundamentals: Parts 1-4

January 13, 2025, 1:00pm
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

Python Fundamentals: Parts 4-6

January 27, 2025, 2:00pm
This three-part interactive workshop series teaches you intermediate programming Python for people with previous programming experience equivalent to our Python Fundamentals workshop. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

What are Time Series Made of?

December 10, 2024
by Bruno Smaniotto. Trend-cycle decompositions are statistical tools that help us understand the different components of Time Series – Trend, Cycle, Seasonal, and Error. In this blog post, we will provide an introduction to these methods, focusing on the intuition behind the definition of the different components, providing real-life examples and discussing applications.

Cloud SQL Databases for Social Media Data

December 10, 2024, 10:00am
This is a hands-on workshop on analyzing Social Media Data using Cloud Databases, specifically Google Cloud Platform's BigQuery. In this session, you'll learn how to leverage existing Reddit and other publicly available datasets in the cloud, import additional data, and perform meaningful analyses relevant to social science research.

SQL Database Fundamentals for Data Analysis

December 9, 2024, 10:00am
This workshop introduces the fundamentals of SQL, with a focus on using SQLite (the most ubiquitous database on the planet) for data science tasks. We'll explore how SQL can be used to query and manipulate relational databases. This hands-on workshop includes exercises based on real-world datasets.

Language Models in Mental Health Conversations – How Empathetic Are They Really?

December 3, 2024
by Sohail Khan. Language models are becoming integral to daily life as trusted sources of advice. While their utility has expanded from simple tasks like text summarization to more complex interactions, the empathetic quality of their responses is crucial. This article explores methods to assess the emotional appropriateness of these models, using metrics such as BLEU, ROUGE, and Sentence Transformers. By analyzing models like LLaMA in mental health dialogues, we learn that while they suffer through traditional word-based metrics, LLaMA's performance in capturing empathy through semantic similarity is promising. In addition, we must advocate for continuous monitoring to ensure these models support their users' mental well-being effectively.

GitHub is Not Just for Coding: The Powerful Task Management Tool in Your Back Pocket

November 26, 2024
by Elena Stacy. This article introduces the use of GitHub as a task management tool for researchers in any field – even if your project doesn’t involve coding. GitHub is a free tool that many researchers already use in some capacity, and can be easily adapted specifically to task management to enable transparent project collaboration and documentation. We walk through the advantages of using GitHub for this purpose, and provide a comprehensive tutorial on how to get up and running with GitHub as a task management tool for your own projects.