Data Science

Python Web Scraping

November 2, 2023, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Python Text Analysis: Word Embeddings

October 25, 2023, 2:00pm
How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text, because it does not utilize word context.

Python Fundamentals: Parts 1-3

October 24, 2023, 2:00pm
This three-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Python Web APIs

October 26, 2023, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Python Machine Learning Fundamentals: Parts 1-2

November 7, 2023, 9:00am
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Qualtrics Fundamentals

October 5, 2023, 2:00pm
Qualtrics is a powerful online tool available to Berkeley community members that can be used for a range of data collection activities. Primarily, Qualtrics is designed to make web surveys easy to write, test, and implement, but the software can be used for data entry, training, quality control, evaluation, market research, pre/post-event feedback, and other uses with some creativity.

Lauren Chambers

Consulting Drop-In Hours: Wed 1pm-3pm

Consulting Areas: Python, R, HTML / CSS, APIs, Data Manipulation and Cleaning, Data Science, Data Visualization, Python Programming, R Programming, Software Tools, Web Scraping, Regression Analysis, Software Output Interpretation, Bash or Command Line, Git or Github, OCR, RStudio

Chirag Manghani

Consulting Drop-In Hours: Fri 2pm-4pm

Consulting Areas: Python, R, SQL, Stata, SAS, LaTeX, HTML / CSS, Javascript, C++, APIs, Cloud & HPC Computing, Cybersecurity & Data Security, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Deep Learning, Machine Learning, Natural Language Processing, Python Programming, R Programming, Software Tools, Text Analysis, Web Scraping, Regression Analysis, Software Output Interpretation, Bash or Command Line, Excel, Git or Github, Qualtrics, RStudio, RStudio...

Testing for Measurement Invariance using Lavaan (in R)

February 7, 2023
by Enrique Valencia López. Measurement invariance has increasingly become a prerequisite to examine if items in a survey that measure an underlying concept have the same meaning across different cultural and linguistic groups. While there are different ways to examine measurement invariance, the most common approach is using a method known as Multigroup Confirmatory Factor Analysis (MGCFA). In this blog post, I discuss how to conduct a MGCFA using lavaan in R and the different levels needed to establish measurement invariance.

Twitter Text Analysis: A Friendly Introduction, Part 2

March 7, 2023
by Mingyu Yuan. This blog post is the second part of “Twitter Text Analysis”. The goal is to use language models such as BERT to build a classifier on tweets. Word embedding, training and test splitting, model implementation, and model evaluation are introduced in this model.