Data Science

Python Data Wrangling and Manipulation with Pandas

May 31, 2022, 1:00pm
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.
See event details for participation information.

What is MLOps? An Introduction to the World of Machine Learning Operations

May 10, 2022
More than ever, AI and machine learning (ML) are integral parts of our lives and are tightly coupled with the majority of the products we use on a daily basis. We use AI/ML in almost everything we can think of, from advertising to social media and just going about our daily lives! With the prevalent use of these tools and models, it is essential that, as IT systems and software became a disciplined practice in terms of development, maintainability, and reliability in the early 2000s, ML systems follow a similar trend. The field focused on developing such practices is currently loosely defined under many different titles (e.g., machine learning engineering, applied data science), but is most commonly known as MLOps, or Machine Learning Operations.

R Fundamentals: Parts 1-4

May 2, 2022, 10:00am
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

Infosession: D-Lab Data Science Fellowship

May 2, 2022, 2:00pm
The D-Lab is seeking applications for the 2022-2023 cohort of Data Science Fellows. This infosession will give you an in-depth look at the D-Lab Data Science Fellowship and an opportunity for you to ask questions about the program that may be helpful to your application process to become a Fellow!

Scrollytelling through a look at food prices around the world

May 2, 2022

You have gathered the needed data to support your research, check. You have made some hypotheses about what you hope to conclude, check. You have spent time cleaning the data and organizing it in a manner that permits further exploration, check. You have sliced and diced the data with your favorite data exploration software packages or techniques and created some data visualizations that you feel confident about, quadruple check! You are now armed with insights that you hope to showcase to the world, what’s next? In this article, I would like to share some tips for creating a...

A brief primer on Hidden Markov Models

April 25, 2022

For many data science problems, there is a need to estimate unknown information from a sequence of observed events. You may want to know, for instance, whether a person is angry or happy, given a sequence of brain scans taken while playing a video game. Or you may be digitizing an ancient text, but, due to water damage, can’t tell what one word in the sequence says. Or in my case (I’m a wildlife biologist), you may want to infer whether an animal is sleeping or eating at any given moment using a sequence of animal GPS locations.

Now, there are...

Python Introduction to Machine Learning: Parts 1-2

April 25, 2022, 2:00pm
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Excel Fundamentals: Lookups with INDEX-MATCH-MATCH

April 18, 2022

Last week marked the D-Lab’s inaugural “Excel Fundamentals” workshop, and to celebrate I am sharing one of my favorite Excel functions: INDEX-MATCH-MATCH. By combining the INDEX and MATCH functions, we can create a faster and more flexible lookup than the typical approach with VLOOKUP.

First, let’s explore the INDEX function and its three arguments: INDEX(where, down, across). It returns the value of a single cell within a block of data. It knows which cell we are...

Python Fundamentals: Parts 1-4

May 2, 2022, 9:00am
This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

dbplyr: do we still need to learn SQL to create and manage databases?

April 11, 2022

How to deal with datasets that are larger than our computer’s memory? Do we still need to learn Structured Query Language (SQL) to create and manage a database?

As an incipient data analyst, one of my first major challenges was to build and manage a spatial database using PostGIS, an open-source software that adds a geographic to PostgreSQL relational databases. I was given several text files in a hard drive that weighed approximately 10 GB each! My first reaction was to double click on the first text file that I saw… but this was clearly...