Data Sources

Python Web Scraping

November 2, 2023, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Elijah Mercer

Consulting Drop-In Hours: Thu 3pm-4pm; Fri 4pm-5pm

Consulting Areas: Python, R, Data Sources, Mixed Methods, Qualitative methods, Surveys, Sampling & Interviews, Excel, Qualtrics

Quick-tip: the fastest way to speak to a consultant is to first submit a request and then ...

Chirag Manghani

Consulting Drop-In Hours: Fri 2pm-4pm

Consulting Areas: Python, R, SQL, Stata, SAS, LaTeX, HTML / CSS, Javascript, C++, APIs, Cloud & HPC Computing, Cybersecurity & Data Security, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Deep Learning, Machine Learning, Natural Language Processing, Python Programming, R Programming, Software Tools, Text Analysis, Web Scraping, Regression Analysis, Software Output Interpretation, Bash or Command Line, Excel, Git or Github, Qualtrics, RStudio, RStudio...

Acquiring Genomic Data from NCBI

April 4, 2023
by Monica Donegan. Genomic data is essential for studying evolutionary biology, human health, and epidemiology. Public agencies, such as the National Center for Biotechnology Information (NCBI) offer excellent resources and access to vast quantities of genomic data. This blog introduces a brief workflow to download genomic data from public databases.

Why We Need Digital Hermeneutics

July 13, 2023
by Tom van Nuenen. Tom van Nuenen discusses the sixth iteration of his course named Digital Hermeneutics at Berkeley. The class teaches the practices of data science and text analysis in the context of hermeneutics, the study of interpretation. In the course, students analyze texts from Reddit communities, focusing on how these communities make sense of the world. This task combines both close and distant readings of texts, as students employ computational tools to find broader patterns and themes. The article reflects on the rise of AI language models like ChatGPT, and how these machines interpret human interpretations. The popularity and profitability of language models presents an issue for the future of open research, due to the monetization of social media data.

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

September 5, 2023
by Genevieve Smith. This summer I joined the D-Lab hosted Data Science for Social Justice workshop at UC Berkeley diving into Python – including TF-IDF, sentiment analysis, word embeddings, and more – with a lens towards leveraging data science for social justice. My team explored a Reddit channel on abortion and used computational analysis to answer key questions related to abortion access from before versus after Roe vs. Wade was overturned. Computational social science is incredibly powerful, but I continue to grapple with tensions particularly as it relates to employing machine learning and large language in international research, and end with key recommendations for CSS practitioners.

Artificial Intelligence (AI) Systems, the Poor, and Consent: A Feminist Anti-Colonial Lens to Digitalized Surveillance

September 18, 2023
By Alejandro Nuñez. Today’s digital age has created a sea of endless datafication where our everyday interactions, actions, and conversations are turned into data. The advancements of automated artificial intelligence (AI) systems, and their infrastructure in which they are created and trained on, have catapulted us into an era of consistent monitoring and surveillance.

Suraj Nair

Data Science Fellow
School of Information

I am a PhD Student at the School of Information. My research interests lie at the intersection of development economics and machine learning, with a focus on the use of large scale digital data and new computational tools to study pressing issues in global development.

Alex Ramiller

Data Science Fellow
City and Regional Planning

I am a PhD Candidate in City and Regional Planning. My research focuses on the use of large administrative datasets to study residential mobility, neighborhood change, and housing access. I received a Master in Geography from the University of Washington and a Bachelor's in Economics and Geography from Macalester College. I have also consulted on analytical projects for several organizations including the San Francisco Federal Reserve Bank, PolicyLink, and the City of Seattle.

Jailynne Estevez

Info & Data Science MIDS

Jailynne Estevez is a Data Analyst and a prospective Masters in Information and Data Science candidate at UC Berkeley. With a bachelor's in Public Policy, she brings a diverse skill set to her pursuits, demonstrating aptitude in data analysis and programming.