Data Science

Lauren Chambers

Consulting Drop-In Hours: Wed 1pm-3pm

Consulting Areas: Python, R, HTML / CSS, APIs, Data Manipulation and Cleaning, Data Science, Data Visualization, Python Programming, R Programming, Software Tools, Web Scraping, Regression Analysis, Software Output Interpretation, Bash or Command Line, Git or Github, OCR, RStudio

Quick-tip: the fastest way to speak to a consultant is to first ...

Chirag Manghani

Consulting Drop-In Hours: Fri 2pm-4pm

Consulting Areas: Python, R, SQL, Stata, SAS, LaTeX, HTML / CSS, Javascript, C++, APIs, Cloud & HPC Computing, Cybersecurity & Data Security, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Deep Learning, Machine Learning, Natural Language Processing, Python Programming, R Programming, Software Tools, Text Analysis, Web Scraping, Regression Analysis, Software Output Interpretation, Bash or Command Line, Excel, Git or Github, Qualtrics, RStudio, RStudio...

FSRDC 2023 Annual Meeting and Research Conference

October 2, 2023
FSRDC 2023 Annual Meeting and Research Conference

From our ashen sky to Foggy Bottom, I traveled alongside the other directors, administrators, census employees, and members from our partnering agencies towards the National Mall in late September for the 2023 Federal Statistical Research Data Center (FSRDC) Business Meeting and Annual Conference. They were held on September 21st and 22nd at the Federal Reserve in Washington, DC. The hosting Executive Director, Norman Morin, warned all of...

Python Web Scraping

November 2, 2023, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Testing for Measurement Invariance using Lavaan (in R)

February 7, 2023
by Enrique Valencia López. Measurement invariance has increasingly become a prerequisite to examine if items in a survey that measure an underlying concept have the same meaning across different cultural and linguistic groups. While there are different ways to examine measurement invariance, the most common approach is using a method known as Multigroup Confirmatory Factor Analysis (MGCFA). In this blog post, I discuss how to conduct a MGCFA using lavaan in R and the different levels needed to establish measurement invariance.

Twitter Text Analysis: A Friendly Introduction, Part 2

March 7, 2023
by Mingyu Yuan. This blog post is the second part of “Twitter Text Analysis”. The goal is to use language models such as BERT to build a classifier on tweets. Word embedding, training and test splitting, model implementation, and model evaluation are introduced in this model.

Can Machine Learning Models Predict Reality TV Winners? The Case of Survivor

March 14, 2023
by Kelly Quinn. Reality television shows are notorious for tipping the scales to favor certain players they want to see win, but could producers also be spoiling the results in the process? Drawing on data about Survivor, I attempt to predict the likelihood of a contestant making it far into the game based on editing and production decisions, as well as demographic information. This post describes the model used to classify player outcomes and other potential ways to leverage data about reality TV shows for prediction.

Acquiring Genomic Data from NCBI

April 4, 2023
by Monica Donegan. Genomic data is essential for studying evolutionary biology, human health, and epidemiology. Public agencies, such as the National Center for Biotechnology Information (NCBI) offer excellent resources and access to vast quantities of genomic data. This blog introduces a brief workflow to download genomic data from public databases.

A Brief Introduction to Cloud Native Approaches for Big Data Analysis

March 20, 2023
by Millie Chapman. Satellites, smart phones, and other monitoring technologies are creating vast amounts of data about our earth every day. These data hold promise to provide global insights on everything from biodiversity patterns to human activity at increasingly fine spatial and temporal resolution. But leveraging this information often requires us to work with data that is too big to fit in our computer's "working memory" (RAM) or even to download to our computer's hard drive. In this post, I walk through tools, terms, and examples to get started with cloud native workflows. These workflows allow us to remotely access and query large data from online resources or web services, all while skipping the download step!

From paper to vector: converting maps into GIS shapefiles

April 11, 2023
by Madeleine Parker. GIS is incredibly powerful: you can transform, overlay, and analyze data with a few clicks. But sometimes the challenge is getting your data into a form to be able to use with GIS. Have you ever found a PDF or even paper map of what you needed? Or googled your topic with “shapefile” after it to no avail? The process of transforming a PDF, paper, or even hand-drawn map with boundaries into a shapefile for analysis is straightforward but involves a few steps. I walk through the stages of digitization, georeferencing, and drawing, from an image to a vector shapefile ready to be used for visualization and spatial analysis.