Data Sources

Mapping Census Data with tidycensus

November 6, 2023
by Alex Ramiller. The U.S. Census Bureau provides a rich source of publicly available data for a wide variety of research applications. However, the traditional process of downloading these data from the census website is slow, cumbersome, and inefficient. The R package “tidycensus” provides researchers with a tool to overcome these challenges, enabling a streamlined process to quickly downloading numerous datasets directly from the census API (Application Programming Interface). This blog post provides a basic workflow for the use of the tidycensus package, from installing the package and identifying variables to efficiently downloading and mapping census data.

Hate Speech

The hate speech measurement project began in early 2017 at UC Berkeley’s D-Lab. Our research project applies data science techniques such as machine learning to track changes in hate speech over time and across social media platforms. After three years, we have now published our groundbreaking method that measures hate speech with precision while mitigating the influence of human bias. Read the manuscript here.

Americanist Linguistics: on Ethics and Intent

October 17, 2023
by Anna Björklund. In this post, Anna Björklund investigates the origin of the linguistic study of indigenous American languages, its inextricable ties to settler-colonialism, and how linguistics can move forward as a field.

FSRDC 2023 Annual Meeting and Research Conference

October 2, 2023
by Renee Starowicz. Renee Starowicz, Co-Executive Director of the Berkeley Federal Statistical Research Data Center, provides an overview of the takeaways from the 2023 Annual Federal Statistical Research Data Center Business Meeting and Annual Conference. She provides a brief overview of the Berkeley FSRDC. Then, she describes the priorities for collaboration across national directors to improve outreach to diverse researchers and transparency. Additionally, she points out the other key topics of conversation at this year’s meeting.

RETHINKING DATA SCIENCE PEDAGOGY WITH EMBEDDED ETHICAL CONSIDERATIONS

Vandana Janeja
Maria Sanchez
2022

The focus of this paper is to present a tool to meet the need of developing ethical critical thinking in data science curriculum for undergraduate students. New data science methods impact societies, communities directly or indirectly when dealing with open and other real-world datasets. In particular, for data science there is a need to develop ethical critical thinking while analyzing the data. Throughout the entire lifecycle of the data in the knowledge discovery process there are many opportunities for ethical decision making that a data scientist can evaluate to do no harm. To...

Excel Data Analysis: Introduction

October 16, 2023, 1:00pm
This is a three-hour introductory workshop that will provide an overview of Excel, with no prior experience assumed. Attendees will learn how to use functions for handling data and making calculations, how to build charts and pivot tables, and more.

Excel Data Analysis: Charts, Pivot Tables, and VLOOKUP

October 18, 2023, 1:00pm
This three-hour workshop will cover charts in more detail, review pivot tables, and the widely-used VLOOKUP function. We recommend first taking the introductory workshop Excel Data Analysis: Introduction.

Python Web Scraping

November 2, 2023, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Python Web APIs

October 26, 2023, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Acquiring Genomic Data from NCBI

April 4, 2023
by Monica Donegan. Genomic data is essential for studying evolutionary biology, human health, and epidemiology. Public agencies, such as the National Center for Biotechnology Information (NCBI) offer excellent resources and access to vast quantities of genomic data. This blog introduces a brief workflow to download genomic data from public databases.