Location: Remote via Zoom. Link will be sent on the morning of the event.
This workshop is a 3-part series running from 10am-1pm each day:
- Part 1: Tuesday, September 21
- Part 2: Thursday, September 23
- Part 3: Tuesday, September 28
Start Time: D-Lab workshops start 10 minutes after the scheduled start time (“Berkeley Time”). We will admit all participants from the waiting room at that time.
Description
This three-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.
-
Part 1: Basic Tools and Techniques
-
Part 2: Unsupervised Approaches
-
Part 3: Supervised Methods
Part 1: This hands-on workshop goes through the common “preprocessing recipe” that is used as the foundation for a variety of other applications as well as some basic natural language processing techniques. These include: a) removal of stopwords, numbers, punctuation, b) tokenization, c) calculation of word frequencies / proportions, and d) part of speech tagging.
Part 2: This hands-on workshop builds on part 1 by introducing the basics of Python's scikit-learn package to implement unsupervised text analysis methods. This workshop will cover a) vectorization and Document Term Matrices, b) weighting (tf-idf), and c) uncovering patterns using topic modeling.
Part 3: In this workshop we will cover the most common CTA task: supervised classification. Using the Python library scikit-learn, we will implement Logistic Regression and Random Forest methods to perform sentiment analysis. Optional: introduction to word vector representations with Word2Vec.
Prerequisites: D-Lab’s Python Fundamental introductory series or equivalent knowledge.
Workshop Materials: https://github.com/dlab-berkeley/python-text-analysis-fundamentals
Software Requirements:Installation Instructions for Python Anaconda
Is Python Not working on your laptop?
Attend the workshop anyway, we can provide you with a cloud-based solution until you figure out the problems with your local installation.
Feedback: After completing the workshop, please provide us feedback using this form
Questions? Email: dlab-frontdesk@berkeley.edu