Python Text Analysis Fundamentals: Parts 1-3

November 8, 2021, 12:00pm to November 12, 2021, 3:00pm

Location: Remote via Zoom. Link will be sent on the morning of the event.

This workshop is a 3-part series running from 12pm-3pm each day:

  • Part 1: Monday, November 8
  • Part 2: Wednesday, November 10
  • Part 3: Friday, November 12

Start Time: D-Lab workshops start 10 minutes after the scheduled start time (“Berkeley Time”). We will admit all participants from the waiting room at that time.

Description

This three-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

  • Part 1: Basic Tools and Techniques

  • Part 2: Unsupervised Approaches


  • Part 3: Supervised Methods


Part 1: This hands-on workshop goes through the common “preprocessing recipe” that is used as the foundation for a variety of other applications as well as some basic natural language processing techniques.  These include: a) removal of stopwords, numbers, punctuation, b) tokenization, c) calculation of word frequencies / proportions, and d) part of speech tagging.

Part 2: This hands-on workshop builds on part 1 by introducing the basics of Python's scikit-learn package to implement unsupervised text analysis methods. This workshop will cover a) vectorization and Document Term Matrices, b) weighting (tf-idf), and c) uncovering patterns using topic modeling.

Part 3: In this workshop we will cover the most common CTA task: supervised classification. Using the Python library scikit-learn, we will implement Logistic Regression and Random Forest methods to perform sentiment analysis. Optional: introduction to word vector representations with Word2Vec.

Prerequisites: D-Lab’s Python Fundamental introductory series or equivalent knowledge.

Workshop Materials: https://github.com/dlab-berkeley/python-text-analysis-fundamentals

Software Requirements:Installation Instructions for Python Anaconda

Is Python Not working on your laptop?
Attend the workshop anyway, we can provide you with a cloud-based solution until you figure out the problems with your local installation.

Feedback: After completing the workshop, please provide us feedback using this form

Questions? Email: dlab-frontdesk@berkeley.edu