Python Text Analysis Fundamentals: Parts 1-3

September 21, 2021, 10:00am to September 28, 2021, 1:00pm

Location: Remote via Zoom. Link will be sent on the morning of the event.

This workshop is a 3-part series running from 10am-1pm each day:

  • Part 1: Tuesday, September 21
  • Part 2: Thursday, September 23
  • Part 3: Tuesday, September 28

Start Time: D-Lab workshops start 10 minutes after the scheduled start time (“Berkeley Time”). We will admit all participants from the waiting room at that time.


This three-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

  • Part 1: Basic Tools and Techniques

  • Part 2: Unsupervised Approaches

  • Part 3: Supervised Methods

Part 1: This hands-on workshop goes through the common “preprocessing recipe” that is used as the foundation for a variety of other applications as well as some basic natural language processing techniques.  These include: a) removal of stopwords, numbers, punctuation, b) tokenization, c) calculation of word frequencies / proportions, and d) part of speech tagging.

Part 2: This hands-on workshop builds on part 1 by introducing the basics of Python's scikit-learn package to implement unsupervised text analysis methods. This workshop will cover a) vectorization and Document Term Matrices, b) weighting (tf-idf), and c) uncovering patterns using topic modeling.

Part 3: In this workshop we will cover the most common CTA task: supervised classification. Using the Python library scikit-learn, we will implement Logistic Regression and Random Forest methods to perform sentiment analysis. Optional: introduction to word vector representations with Word2Vec.

Prerequisites: D-Lab’s Python Fundamental introductory series or equivalent knowledge.

Workshop Materials:

Software Requirements:Installation Instructions for Python Anaconda

Is Python Not working on your laptop?
Attend the workshop anyway, we can provide you with a cloud-based solution until you figure out the problems with your local installation.

Feedback: After completing the workshop, please provide us feedback using this form

Questions? Email: