Blog post

What is IOKN2K?

January 1, 2020

IOKN2K! is D-Lab’s memo and motto has been for a while now. You might have seen it on a t-shirt or two (we hope…)

Many have asked what IOKN2K! means. And it’s time to officially answer that question. IOKN2K! is an acronym and it stands for It’s OK Not To Know! And like every other meme, it means more than its acronym.

IOKN2K! means is that D-Lab is a place to feel comfortable asking questions; D-Lab is a place to learn the skills that you don’t have yet; D-Lab is the place to come when you think you were supposed to have known...

Introducing “A Three-Step Guide to Training Computational Social Science Ph.D. Students for Academic and Non-Academic Careers”

December 6, 2022
Introducing “A Three-Step Guide to Training Computational Social Science Ph.D. Students for Academic and Non-Academic Careers”

Different colored arrows mark 1, 2, and 3, pointing in alternating up and down directions.

As D-Lab alumni, we are excited to introduce our pre-print “A...

Disaggregating Race and Ethnicity Categories in Census Data

November 1, 2022

The collection of race and ethnicity data by the United States Census Bureau has a long, complex, and problematic history. The Census claims that their racial categories generally reflect a social definition of race recognized in America, adhering to guidelines set by the U.S. Office of Management and Budget. In 1900, the Census recognized five racial categories: White, Black, Chinese, Japanese, and American Indian. Today, the Census collects more...

Getting Started with Surveys

October 18, 2022
Getting Started With Surveys

Surveys can be an extremely useful tool for gathering information from individuals and groups. They are used across disciplines and industries to help researchers learn more about populations and gain actionable insights. I’ve done extensive survey research in nonprofit and industry settings, using these data to improve programs, make changes to technology, design communications plans, make content acquisition decisions, and much more. In this blog post, I’ll focus on one of the most important parts of survey research — planning....

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

What is MLOps? An Introduction to the World of Machine Learning Operations

May 10, 2022
More than ever, AI and machine learning (ML) are integral parts of our lives and are tightly coupled with the majority of the products we use on a daily basis. We use AI/ML in almost everything we can think of, from advertising to social media and just going about our daily lives! With the prevalent use of these tools and models, it is essential that, as IT systems and software became a disciplined practice in terms of development, maintainability, and reliability in the early 2000s, ML systems follow a similar trend. The field focused on developing such practices is currently loosely defined under many different titles (e.g., machine learning engineering, applied data science), but is most commonly known as MLOps, or Machine Learning Operations.

Scrollytelling through a look at food prices around the world

May 2, 2022

You have gathered the needed data to support your research, check. You have made some hypotheses about what you hope to conclude, check. You have spent time cleaning the data and organizing it in a manner that permits further exploration, check. You have sliced and diced the data with your favorite data exploration software packages or techniques and created some data visualizations that you feel confident about, quadruple check! You are now armed with insights that you hope to showcase to the world, what’s next? In this article, I would like to share some tips for creating a...

A brief primer on Hidden Markov Models

April 25, 2022

For many data science problems, there is a need to estimate unknown information from a sequence of observed events. You may want to know, for instance, whether a person is angry or happy, given a sequence of brain scans taken while playing a video game. Or you may be digitizing an ancient text, but, due to water damage, can’t tell what one word in the sequence says. Or in my case (I’m a wildlife biologist), you may want to infer whether an animal is sleeping or eating at any given moment using a sequence of animal GPS locations.

Now, there are...

Working with spatial networks

April 25, 2022

When working with spatial networks, both ArcGIS and Python packages such as NetworkX and iGraph are very useful tools. In the past, I have used both tools to help me better understand spatial network topology and network flow. In this blog post, I hope to share with you some cool features that these tools have...

Excel Fundamentals: Lookups with INDEX-MATCH-MATCH

April 18, 2022

Last week marked the D-Lab’s inaugural “Excel Fundamentals” workshop, and to celebrate I am sharing one of my favorite Excel functions: INDEX-MATCH-MATCH. By combining the INDEX and MATCH functions, we can create a faster and more flexible lookup than the typical approach with VLOOKUP.

First, let’s explore the INDEX function and its three arguments: INDEX(where, down, across). It returns the value of a single cell within a block of data. It knows which cell we are...