It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data. This workshop will introduce tools (notably dplyr and tidyr) that makes data wrangling and manipulation much easier. Participants will learn how to use these packages to subset and reshape data sets, do calculations across groups of data, clean data, and other useful stuff.

Sign up for our mailing list!

This is an archive of our past training offerings. We are looking to include workshops on topics not yet covered here. Is there something not currently on the list? Send us a proposal.

This three-part series will cover the following materials:

**Part 1: Introduction**

This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Join David Harding, Professor of Sociology and Faculty Director of D-Lab at UC Berkeley, for a discussion on how to more successfully apply for qualitative research grants from funders with positivist inclinations. Prof.

This hands on workshop builds on part 1 by introducing the basics of Python's scikit-learn package to implement unsupervised text analysis methods. This workshop will cover a) vectorization and Document Term Matrices, b) weighting (tf-idf), and c) uncovering patterns using topic modeling.

Geospatial data are an important component of social science and humanities data visualization and analysis. The R programming language is a great platform for exploring these data and integrating them into a research project.

**Geospatial Data in R, part 2: Geoprocessing and analysis**

This three-part series will cover the following materials:

**Part 1: Introduction**

This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

**Part 3 Topics:**

This three-part series will cover the following materials:

**Part 1: Introduction**

Geospatial data are an important component of social science and humanities data visualization and analysis. The R programming language is a great platform for exploring these data and integrating them into your research.

**Geospatial Data in R, part I:** **Getting started with spatial data objects**

For this workshop, we'll provide an introduction to visualization with Python. We'll cover visualization theory and plotting with Matplotlib and Seaborn, working through examples in a Jupyter (formerly IPython) notebook. The following plot types will be covered:

- line
- bar
- scatter
- boxplot

We'll also learn about styles and customizing plots.

**Part 2: Working With Projections & Spatial Queries**

This workshop will focus on organizing, coding, and analyzing qualitative data using ATLAS.ti, a qualitative data analysis (QDA) software program for which D-Lab provides support.

In Visualization in Excel, we will cover the fundamentals of visualization in Excel, including a checklist of considerations that should go into every visualization. We will also go through step by step instructions on how to make horizontal bar charts, slope graphs, butterfly charts, the good kind of pie charts, icon arrays, and how to graph confidence intervals.

Students will learn about the different ways in which entities in the real world are represented as geographic data. They will be introduced to QGIS, an open-source Geographic Information System (GIS) tool for working with geographic data.

This hands-on workshop presents a broad overview of the existing methods to use text as data, with a focus on applications in social sciences and humanities.

This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

**Part 1 Topics:**

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python.

In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

We plan to cover:

This class will cover the basics of Excel, from simple formulas (SUM, COUNTIF) to more complex Excel features like Macros and the Data Analysis ToolPak. By the end of both sections, students will be able to employ Excel skills to open source policy data sets. These skills are transferrable to any sector.

Topics Covered Will Include: