Data Science

Getting Started with the NYT API

March 1, 2022

Introduction

The web is chock full of valuable troves of data that can spawn an infinite number of social science research projects. However, not all data is easily accessible! While some data can be easily downloaded, access to some sources of data are dictated by what is known as an API. Standing for application programming interface, APIs are a set of defined protocols governing the terms of access to software and servers from programs created...

Enumeration of Informal Work

March 1, 2022

The first time that I mapped out poverty statistics at a municipal scale, I was completely mind blown (figure 1). Looking at the spatial inequities from a bird’s-eye view drove my desire to find more granular data of social indicators to better understand intra-urban socioeconomic inequities. Spatial data techniques help us to find patterns and anomalies across data that improves our understanding of people’s lives in cities, raising new questions about urban infrastructure in terms of public goods provision, land-use, and access. However, finding granular socioeconomic...

Ian Castro

D-Lab Alumni
School of Information

Ian is a graduate student in the Master of Information Management and Systems program at the School of Information with a focus in applied data science. He earned his B.A. in Media Studies and B.S. in Microbial Biology from UC Berkeley, and his research interests and work experience are in STEM education. He focuses in building courses and academic programs to make data and computing accessible to historically marginalized students and those without prior exposure to the field.

PoliPy: A Python Library for Scraping and Analyzing Privacy Policies

February 8, 2022

In light of recent scandals involving the misuse and improper handling of personal data by large corporations, advocacy groups and regulators alike have given increased attention to the issue of consumer privacy [e.g., 1, 2, 3, 4, 5]. National and local governments have been enacting privacy legislation that requires companies to minimize the amount of data they collect, deters the collection of sensitive data, limits the purposes for which the data are used, and critically, gives users more transparency into data collection and use.

As part...

Where the Streets Have No Name: Spatial Data in Informal Settlements

February 1, 2022

In our era, with Google Maps on every smartphone, it may feel like spatial data is easy to come by. However, this is not the case for many communities in the world. In particular, for informal settlements, developed “outside state control over urban design, planning, and construction,” accurate maps can be hard to come by. You may open up Google Maps to find a few streets with no names, or sometimes, nothing at all. Informal settlements are...

Is your Random Sample Really Random?

January 20, 2022

One of the frequent ways people can run into random numbers is through their research. We often hear the term “random sample,” or a “randomized” assignment to control. Or, sometimes, we can randomly select a certain number of rows or columns from data to perform an analysis on a representative snapshot of the data. Additionally, for many of us from a natural science or engineering background, random numbers are often used in simulations or optimization models. Given the wide variety of uses for random numbers in Data Science, I thought it would be interesting to take an...

Working with spatial networks using NetworkX

December 7, 2021

I have always been interested in working with spatial networks. My first introduction to spatial network modeling was in Prof. John Radke’s Geographic Information Systems class when I learned about building and analyzing spatial networks using the Network Analyst extension in ArcMap. This extension provides powerful tools to solve common network problems, such as finding the best route across a city, finding the closest...

Resisting our Data Doppelgangers: A Proposal for Unpacking the Dangers of Data-Driven Fertility Advertising With Data Science Tools

December 7, 2021

Introduction

When Janet Vertasi, a sociology professor of technology at Princeton, learned of her pregnancy, she decided to conduct a personal experiment. She hid her pregnancy from the internet for nine months. This meant only sharing her pregnancy with close friends and family, using her own personal server while making purchases on Amazon and even opting to use cash For many of her transactions. During this time Amazon mistook her as a “suspicious customer” (Vertasi 2014, Gray 2014). Recall another incident of how Target found out about a...

A Beginner’s Guide to the Bootstrap

November 22, 2021

What is the bootstrap method?

If you take a quantitative methods course here at Berkeley, chances are that you will learn how to perform a bootstrap. As an introductory data science instructor, it’s one of my favorite topics to teach, not just because it’s a powerful and useful tool, but also because it’s incredibly intuitive. In short, the bootstrap -- also known as resampling with replacement -- allows us to generate a distribution of sample statistics given only a single sample, estimating sampling error.The name of this method...

Stumbling Upon Data Sonification When I Fused My Passion for Music with Coding

November 16, 2021

Like many graduate students from the MIDS program who are also full-time working professionals, I return to campus to seek knowledge and satisfy my intellectual curiosity in information and data science. It has become a part of a lifelong learning pursuit that enables me to constantly apply what I learn back into the real world. Along the way, I never forget that it is also important to have fun with science by combining new knowledge with my own passions in arts and music in whatever ways possible. For nearly a decade, I have been helping clients in...