Python

Python Fundamentals: Parts 1-3

October 24, 2023, 2:00pm
This three-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Python Web APIs

October 26, 2023, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Python Machine Learning Fundamentals: Parts 1-2

November 7, 2023, 9:00am
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Python Web Scraping

November 2, 2023, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Nimita Gaggar

Consultant
Public Health

Passionate and driven Public Health graduate student at UC Berkeley with a strong background in program management and a relentless pursuit of excellence. I have 5+ years of experience in program management and operations in the healthcare industry. My academic journey at UC Berkeley has equipped me with a multifaceted skill set, blending strategic thinking, data-driven decision-making, and effective communication. I thrive in fast-paced, dynamic environments and have a proven ability to lead cross-functional teams toward project success.

Twitter Text Analysis: A Friendly Introduction, Part 2

March 7, 2023
by Mingyu Yuan. This blog post is the second part of “Twitter Text Analysis”. The goal is to use language models such as BERT to build a classifier on tweets. Word embedding, training and test splitting, model implementation, and model evaluation are introduced in this model.

Can Machine Learning Models Predict Reality TV Winners? The Case of Survivor

March 14, 2023
by Kelly Quinn. Reality television shows are notorious for tipping the scales to favor certain players they want to see win, but could producers also be spoiling the results in the process? Drawing on data about Survivor, I attempt to predict the likelihood of a contestant making it far into the game based on editing and production decisions, as well as demographic information. This post describes the model used to classify player outcomes and other potential ways to leverage data about reality TV shows for prediction.

Acquiring Genomic Data from NCBI

April 4, 2023
by Monica Donegan. Genomic data is essential for studying evolutionary biology, human health, and epidemiology. Public agencies, such as the National Center for Biotechnology Information (NCBI) offer excellent resources and access to vast quantities of genomic data. This blog introduces a brief workflow to download genomic data from public databases.

Mapping Time-Series Satellite Images with Google Earth Engine API

July 17, 2023
by Meiqing Li. Remote sensing imagery has the potential to reveal land use patterns and human activities at a planetary scale. For example, nighttime light intensity extracted from can shed light on spatial patterns of human activities and settlements, especially in places where traditional data are scarce. This blog post introduces Google Earth Engine (GEE) as a general purpose tool to extract time-series remote sensing data from GEE data catalog. I walk through using GEE to obtain data, filter by time and geographic region, and visualize it on static and interactive maps.

D-Lab & Graduate Division create inclusive data science summer program

August 9, 2023
by Vanessa Navarro Rodriguez. UC Berkeley's Social Sciences D-Lab and Graduate Division created the Data Science for Social Justice Program to address underrepresentation in data science. The program teaches diverse students critical data analysis and its applications in addressing societal injustices. The 8-week free summer course for admitted University of California students focuses on Python programming, Natural Language Processing, and value-informed data practices. It aims to empower students from underrepresented backgrounds and to bridge STEM with social justice. This blog post elaborates on the program's creation and features one of the DSSJ students, Robin López, and his reasons for participating.