Data Science

Bobo Kwok

UTech
Data Science

I am an undergraduate student studying Data Science with an emphasis in Applied Mathematics & Modeling. I enjoy storytelling through data visuals and learning new visualization tools.

What is MLOps? An Introduction to the World of Machine Learning Operations

May 10, 2022
More than ever, AI and machine learning (ML) are integral parts of our lives and are tightly coupled with the majority of the products we use on a daily basis. We use AI/ML in almost everything we can think of, from advertising to social media and just going about our daily lives! With the prevalent use of these tools and models, it is essential that, as IT systems and software became a disciplined practice in terms of development, maintainability, and reliability in the early 2000s, ML systems follow a similar trend. The field focused on developing such practices is currently loosely defined under many different titles (e.g., machine learning engineering, applied data science), but is most commonly known as MLOps, or Machine Learning Operations.

Scrollytelling through a look at food prices around the world

May 2, 2022

You have gathered the needed data to support your research, check. You have made some hypotheses about what you hope to conclude, check. You have spent time cleaning the data and organizing it in a manner that permits further exploration, check. You have sliced and diced the data with your favorite data exploration software packages or techniques and created some data visualizations that you feel confident about, quadruple check! You are now armed with insights that you hope to showcase to the world, what’s next? In this article, I would like to share some tips for creating a...

A brief primer on Hidden Markov Models

April 25, 2022

For many data science problems, there is a need to estimate unknown information from a sequence of observed events. You may want to know, for instance, whether a person is angry or happy, given a sequence of brain scans taken while playing a video game. Or you may be digitizing an ancient text, but, due to water damage, can’t tell what one word in the sequence says. Or in my case (I’m a wildlife biologist), you may want to infer whether an animal is sleeping or eating at any given moment using a sequence of animal GPS locations.

Now, there are...

Excel Fundamentals: Lookups with INDEX-MATCH-MATCH

April 18, 2022

Last week marked the D-Lab’s inaugural “Excel Fundamentals” workshop, and to celebrate I am sharing one of my favorite Excel functions: INDEX-MATCH-MATCH. By combining the INDEX and MATCH functions, we can create a faster and more flexible lookup than the typical approach with VLOOKUP.

First, let’s explore the INDEX function and its three arguments: INDEX(where, down, across). It returns the value of a single cell within a block of data. It knows which cell we are...

dbplyr: do we still need to learn SQL to create and manage databases?

April 11, 2022

How to deal with datasets that are larger than our computer’s memory? Do we still need to learn Structured Query Language (SQL) to create and manage a database?

As an incipient data analyst, one of my first major challenges was to build and manage a spatial database using PostGIS, an open-source software that adds a geographic to PostgreSQL relational databases. I was given several text files in a hard drive that weighed approximately 10 GB each! My first reaction was to double click on the first text file that I saw… but this was clearly...

What can state government do…to attract a data scientist like YOU?

March 29, 2022

What can state government do…to attract a data scientist like YOU?

By Kellie Hogue

What’s your next move? When I was in grad school, one of my professors told me that regardless of the job I am currently in, I should always be planning the next step in my career.

At the time, it made sense–academic appointments in my discipline were few and far between, and I wouldn’t get one without some major strategic networking and planning. Simply a case of too much supply, not enough demand....

Predicting Madness: This March Madness, you can be your friend group’s resident Bracketologist.

March 7, 2022

On Selection Sunday, a twelve-member NCAA committee kicks off March Madness by picking America’s best college basketball teams. Each year, millions of people build their bracket based on records, school allegiances, favorite colors, and weirdest mascots. The national college basketball event that pins the top 64 Division I teams in the country in a knockout style tournament is one of the largest betting events in sports. In the course of 68 games, over $8.5 billion across 40 million bets are estimated to be made both legally and illegally (Odds Shark, 2021). ...

Twitter data extraction with Selenium

March 1, 2022

Introduction

With online communities and social networks serving as important sites for computational social science research, Twitter has quickly become a popular data source for researchers (Frey et al. (2020), Kusen et al. (2017), Rao et al. (2010) and Ru et al. (2021)). This blog post will demonstrate one way to extract twitter data without using the Twitter API. This is especially useful for researchers who are new to exploring the use of Twitter data in their research, looking to develop a baseline corpus for a research question they are newly...