Blog post

FSRDC 2023 Annual Meeting and Research Conference

October 2, 2023

Renee Starowicz, Ph.D.

by Renee Starowicz. Renee Starowicz, Co-Executive Director of the Berkeley Federal Statistical Research Data Center, provides an overview of the takeaways from the 2023 Annual Federal Statistical Research Data Center Business Meeting and Annual Conference. She provides a brief overview of the Berkeley FSRDC. Then, she describes the priorities for collaboration across national directors to improve outreach to diverse researchers and transparency. Additionally, she points out the other key topics of conversation at this year’s meeting.

Read more about FSRDC 2023 Annual Meeting and Research Conference

Black History Data

February 28, 2023

Patty Frontiera, Ph.D.

by Patty Frontiera, Ph.D. D-Lab is excited to announce the publication of two articles and associated datasets on the Louisiana Slave Conspiracies Project (LSC). This is a project of collaboration from many of our D-Lab staff and student researchers, under the direction of Professor Bryan Wagner as the Principal Investigator (PI). The LSC project is dedicated to preserving, digitizing, transcribing, translating, and analyzing historical manuscripts concerning two slave conspiracies organized at the Pointe Coupée Post in the Spanish territory of Louisiana in 1791 and 1795. Our research outputs include (1) complete bibliographic and demographic information as well as (2) geospatial place data that were extracted from trial records related to these two conspiracies:

Read more about Black History Data

Testing for Measurement Invariance using Lavaan (in R)

February 7, 2023

Enrique Valencia López

by Enrique Valencia López. Measurement invariance has increasingly become a prerequisite to examine if items in a survey that measure an underlying concept have the same meaning across different cultural and linguistic groups. While there are different ways to examine measurement invariance, the most common approach is using a method known as Multigroup Confirmatory Factor Analysis (MGCFA). In this blog post, I discuss how to conduct a MGCFA using lavaan in R and the different levels needed to establish measurement invariance.

Read more about Testing for Measurement Invariance using Lavaan (in R)

Using Artificial Intelligence to Help Write Code

February 28, 2023

Daniel Tan

by Daniel Tan. ChatGPT is a natural language processing model that has applications in a wide variety of research settings. It is a chatbot-style tool that was created by OpenAI using a deep learning model that allows it to generate human-like responses to a wide variety of questions and prompts spanning a multitude of topics. Because it has been trained on a large body of text, ChatGPT is a particularly useful tool for programming. This post explores ways to use ChatGPT to help write code in Stata, a statistical software package that is widely used in academic and policy research.

Read more about Using Artificial Intelligence to Help Write Code

Twitter Text Analysis: A Friendly Introduction, Part 2

March 7, 2023

Mingyu Yuan

by Mingyu Yuan. This blog post is the second part of “Twitter Text Analysis”. The goal is to use language models such as BERT to build a classifier on tweets. Word embedding, training and test splitting, model implementation, and model evaluation are introduced in this model.

Read more about Twitter Text Analysis: A Friendly Introduction, Part 2

Can Machine Learning Models Predict Reality TV Winners? The Case of Survivor

March 14, 2023

Kelly Quinn

by Kelly Quinn. Reality television shows are notorious for tipping the scales to favor certain players they want to see win, but could producers also be spoiling the results in the process? Drawing on data about Survivor, I attempt to predict the likelihood of a contestant making it far into the game based on editing and production decisions, as well as demographic information. This post describes the model used to classify player outcomes and other potential ways to leverage data about reality TV shows for prediction.

Read more about Can Machine Learning Models Predict Reality TV Winners? The Case of Survivor

A Brief Introduction to Cloud Native Approaches for Big Data Analysis

March 20, 2023

Millie Chapman

by Millie Chapman. Satellites, smart phones, and other monitoring technologies are creating vast amounts of data about our earth every day. These data hold promise to provide global insights on everything from biodiversity patterns to human activity at increasingly fine spatial and temporal resolution. But leveraging this information often requires us to work with data that is too big to fit in our computer's "working memory" (RAM) or even to download to our computer's hard drive. In this post, I walk through tools, terms, and examples to get started with cloud native workflows. These workflows allow us to remotely access and query large data from online resources or web services, all while skipping the download step!

Read more about A Brief Introduction to Cloud Native Approaches for Big Data Analysis

Acquiring Genomic Data from NCBI

April 4, 2023

Monica Donegan

by Monica Donegan. Genomic data is essential for studying evolutionary biology, human health, and epidemiology. Public agencies, such as the National Center for Biotechnology Information (NCBI) offer excellent resources and access to vast quantities of genomic data. This blog introduces a brief workflow to download genomic data from public databases.

Read more about Acquiring Genomic Data from NCBI

From paper to vector: converting maps into GIS shapefiles

April 11, 2023

Madeleine Parker

by Madeleine Parker. GIS is incredibly powerful: you can transform, overlay, and analyze data with a few clicks. But sometimes the challenge is getting your data into a form to be able to use with GIS. Have you ever found a PDF or even paper map of what you needed? Or googled your topic with “shapefile” after it to no avail? The process of transforming a PDF, paper, or even hand-drawn map with boundaries into a shapefile for analysis is straightforward but involves a few steps. I walk through the stages of digitization, georeferencing, and drawing, from an image to a vector shapefile ready to be used for visualization and spatial analysis.

Read more about From paper to vector: converting maps into GIS shapefiles

Why We Need Digital Hermeneutics

July 13, 2023

Tom van Nuenen

by Tom van Nuenen. Tom van Nuenen discusses the sixth iteration of his course named Digital Hermeneutics at Berkeley. The class teaches the practices of data science and text analysis in the context of hermeneutics, the study of interpretation. In the course, students analyze texts from Reddit communities, focusing on how these communities make sense of the world. This task combines both close and distant readings of texts, as students employ computational tools to find broader patterns and themes. The article reflects on the rise of AI language models like ChatGPT, and how these machines interpret human interpretations. The popularity and profitability of language models presents an issue for the future of open research, due to the monetization of social media data.

Read more about Why We Need Digital Hermeneutics

« first View: Taxonomy term
‹ previous View: Taxonomy term
…
3 of 15 View: Taxonomy term
4 of 15 View: Taxonomy term
5 of 15 View: Taxonomy term
6 of 15 View: Taxonomy term
7 of 15 View: Taxonomy term (Current page)
8 of 15 View: Taxonomy term
9 of 15 View: Taxonomy term
10 of 15 View: Taxonomy term
11 of 15 View: Taxonomy term
…
next › View: Taxonomy term
last » View: Taxonomy term