Log in

Sign up for our weekly newsletter!

When & Where
Mon, April 12, 2021 - 9:00 AM to 12:00 PM
Wed, April 14, 2021 - 9:00 AM to 12:00 PM
Fri, April 16, 2021 - 9:00 AM to 12:00 PM
Remote (Zoom link below)


This workshop is one of a three-part series that will prepare participants to move forward with text analysis research, with a special focus on humanities and social science applications.

  • Part 1: Basic Tools and Techniques

  • Part 2: Unsupervised Approaches

  • Part 3: Supervised Methods

Part 1: This hands-on workshop goes through the common “preprocessing recipe” that is used as the foundation for a variety of other applications as well as some basic natural language processing techniques.  These include: a) removal of stopwords, numbers, punctuation, b) tokenization, c) calculation of word frequencies / proportions, and d) part of speech tagging.

Part 2: This hands on workshop builds on part 1 by introducing the basics of Python's scikit-learn package to implement unsupervised text analysis methods. This workshop will cover a) vectorization and Document Term Matrices, b) weighting (tf-idf), and c) uncovering patterns using topic modeling.

Part 3: In this workshop we will cover the most common CTA task: supervised classification. Using the Python library scikit-learn, we will implement Logistic Regression and Random Forest methods to perform sentiment analysis. Optional: introduction to word vector representations with Word2Vec.

Prior knowledge: D-Lab’s Python Fundamentals or equivalent knowledge. 

NOTE: D-Lab workshops normally start 10 minutes after the scheduled start time (“Berkeley Time”). We recommend you log on at the start time to join the waiting room where hosts will message you further information.

Training Keywords: 
Computational Text Analysis, Natural Language Processing
Primary Tool: 
Training Learner Level: 
Intermediate to Advanced Competency
Training Host: 
D-lab Facilitator: 
Evan Muzzall
Format Detail: 
Remote, hands-on, interactive
Participant Technology Requirement: 
Laptop, Internet connection, Zoom account
Log in to register for this training.