Text Analysis

Python Text Analysis: Word Embeddings

April 6, 2022, 3:00pm
How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text, because it does not utilize word context.

Emily Grabowski

Senior Data Science Fellow, Senior Instructor, Senior Consultant
Linguistics

I am a Ph.D. student in Linguistics. My research interests include understanding how our speech production and speech perception systems constrain linguistic variation, especially as it applies to the larynx. I am also interested in integrating theoretical representations of language with speech. I approach this using a broad variety of tools/methodologies, including theoretical work, experiments, and modeling. Current projects include developing a computational tool to expedite the analysis of pitch and an online perception experiment on the relationship between pitch and perceived...

Spencer Le

Data Peer Consultant, UTech
Computer Science
Data Science

I am a senior majoring in Computer Science and minoring in Data Science. I love crunching down big data and analyzing it in order to help solve real-life issues. In my free time, I like jamming out to music, drawing, studying history, and posting on my foodstagram. If you have any questions regarding Computer Science or Data Science, please stop by!

Twitter data extraction with Selenium

March 1, 2022

Introduction

With online communities and social networks serving as important sites for computational social science research, Twitter has quickly become a popular data source for researchers (Frey et al. (2020), Kusen et al. (2017), Rao et al. (2010) and Ru et al. (2021)). This blog post will demonstrate one way to extract twitter data without using the Twitter API. This is especially useful for researchers who are new to exploring the use of Twitter data in their research, looking to develop a baseline corpus for a research question they are newly...

PoliPy: A Python Library for Scraping and Analyzing Privacy Policies

February 8, 2022

In light of recent scandals involving the misuse and improper handling of personal data by large corporations, advocacy groups and regulators alike have given increased attention to the issue of consumer privacy [e.g., 1, 2, 3, 4, 5]. National and local governments have been enacting privacy legislation that requires companies to minimize the amount of data they collect, deters the collection of sensitive data, limits the purposes for which the data are used, and critically, gives users more transparency into data collection and use.

As part...

Python Text Analysis Fundamentals: Parts 1-3

February 15, 2022, 2:00pm
This three-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

Jennifer Kaplan

Consultant
French

Jennifer is a first-year graduate student in the Romance Languages & Literatures program here at Berkeley. She has experience conducting ethnographic fieldwork and is passionate about qualitative research methods.

Python Text Analysis Fundamentals: Parts 1-3

November 8, 2021, 12:00pm
This three-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

Text Analysis for Public Health

October 5, 2021
October 5th, 2021 - another day in the global pandemic. Average Joes are busy tweeting about it, politicians give interviews on the latest plans, and newspapers publish article after article on vaccination levels, case counts, and the booster shot. That’s a ton of information. So much in fact, that it would be pretty nice to have some computer assisted help to sort through it. Enter stage right: text analysis. Just what is it, and in the midst of COVID-19, how can it be used to advance public health? Text analysis is a family of analytic techniques used to identify patterns and meaning from unstructured text, that is, text that a computer can’t readily understand. Aka, most qualitative data. And there is a lot of that sort of data floating around. We’re talking tweets, Reddit posts, and emails, but also electronic health records (EHRs), books, and even academic research. You’ll probably agree that in that list alone, there’s a lot of valuable data!

Python Text Analysis Fundamentals: Parts 1-3

September 21, 2021, 10:00am
This three-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.