Software Tools

Tactics for Text Mining non-Roman Scripts

April 15, 2024
by Hilary Faxon, Ph.D. & Win Moe. Non-Roman scripts pose particular challenges for text mining. Here, we reflect on a project that used text mining alongside qualitative coding to understand the politicization of online content following Myanmar’s 2021 military coup.

Chirag Manghani

Consultant
School of Information

Chirag is a 2nd year graduate at the I-School. Proficient in Python, Java, R, and SQL, he navigates software application development, machine learning and data science. His keen interest lies in data analysis and statistical methods, driving him to bridge theory and practice seamlessly. Chirag's dedication to excellence, adaptable mindset, and innate curiosity define him as a dynamic problem solver in the ever-evolving tech landscape.

Aaron Culich

Deputy Director of D-Lab; Cyberinfrastructure Architect and Consulting Lead

Aaron Culich is a staff member at the D-Lab with expertise in Cloud Computing, High Performance Computing (HPC), Databases (SQL and NoSQL), JupyterHub and BinderHub infrastructure, and a variety of programming languages (Python, R, Java, C, C++, and more). His ongoing mission is to explore new compute possibilities, discovering useful tools and practices, and making them more accessible to researchers on campus and beyond.

Thomas Lai

Consultant
School of Information

I am a Product Engineer passionate about applying engineering, data science, machine learning, and problem-solving principles to improve device performance and solve complex challenges. With experience in statistical analysis, lab bench automation, and Python scripting, I have developed a strong technical skill set that allows me to make meaningful contributions to any project. Beyond my work, I am also passionate about exploring new topics and ideas, from the latest technology trends to how to improve the overall well-being of humans. I enjoy applying the first principle to any...

Introduction to Propensity Score Matching with MatchIt

April 1, 2024
by Alex Ramiller. When working with observational (i.e. non-experimental) data, it is often challenging to establish the existence of causal relationships between interventions and outcomes. Propensity Score Matching (PSM) provides a powerful tool for causal inference with observational data, enabling the creation of comparable groups that allow us to directly measure the impact of an intervention. This blog post introduces MatchIt – a software package that provides all of the necessary tools for conducting Propensity Score Matching in R – and provides step-by-step instructions on how to conduct and evaluate matches.

US Census Bureau Restricted-Access Research Data Center (FSRDC) Info Session

April 24, 2024, 11:00am
Interested in restricted Census or partnering RDC agency (AHRQ, BLS, BEA, NCHS) data use? This one-hour introductory workshop will provide an overview of the Berkeley Federal Statistical Research Data Center, with no prior experience assumed. Attendees will learn about the national RDC network, how to access information online about restricted Census data, and how to navigate proposal development.

Bash + Git: Introduction

May 2, 2024, 2:00pm
This workshop will start by introducing you to navigating your computer’s file system and basic Bash commands to remove the fear of working with the command line and to give you the confidence to use it to increase your productivity. And then working with Git, a powerful tool for keeping track of changes you make to the files in a project.

R Data Visualization

February 22, 2024, 10:00am
This workshop will provide an introduction to graphics in R with ggplot2. Participants will learn how to construct, customize, and export a variety of plot types in order to visualize relationships in data. We will also explore the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and coloring or panelling by groups. You will learn how to make histograms, boxplots, scatterplots, lineplots, and heatmaps as well as how to make compound figures.

R Geospatial Fundamentals: Parts 1-3

March 11, 2024, 9:00am
Geospatial data are an important component of data visualization and analysis in the social sciences, humanities, and elsewhere. The R programming language is a great platform for exploring these data and integrating them into your research. This workshop focuses on fundamental operations for reading, writing, manipulating and mapping vector data, which encodes location as points, lines and polygons.

R Data Wrangling and Manipulation: Parts 1-2

March 19, 2024, 9:00am
It is said that 80% of data analysis is spent on the process of cleaning and preparing the data for exploration, visualization, and analysis. This R workshop will introduce the dplyr and tidyr packages to make data wrangling and manipulation easier. Participants will learn how to use these packages to subset and reshape data sets, do calculations across groups of data, clean data, and other useful tasks.