Software Tools

Git for Research Transparency and Reproducibility Training (RT2)

June 6, 2024, 3:15pm
This is a custom Git workshop for the 2024 Research Transparency and Reproducibility Training (RT2).
Registration is not required.

R Data Visualization

June 26, 2024, 1:00pm
This workshop will provide an introduction to graphics in R with ggplot2. Participants will learn how to construct, customize, and export a variety of plot types in order to visualize relationships in data. We will also explore the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and coloring or panelling by groups. You will learn how to make histograms, boxplots, scatterplots, lineplots, and heatmaps as well as how to make compound figures.

Conceptual Mirrors: Reflecting on LLMs' Interpretations of Ideas

April 23, 2024
by María Martín López. As large language models begin to engrain themselves in our daily lives we must leverage cognitive psychology to explore the understanding that these algorithms have of our world and the people they interact with. LLMs give us new insights into how conceptual representations are formed given the limitations of data modalities they have access to. Is language enough for these models to conceptualize the world? If so, what conceptualizations do they have of us?

Tactics for Text Mining non-Roman Scripts

April 15, 2024
by Hilary Faxon, Ph.D. & Win Moe. Non-Roman scripts pose particular challenges for text mining. Here, we reflect on a project that used text mining alongside qualitative coding to understand the politicization of online content following Myanmar’s 2021 military coup.

Chirag Manghani

School of Information

Chirag is a 2nd year graduate at the I-School. Proficient in Python, Java, R, and SQL, he navigates software application development, machine learning and data science. His keen interest lies in data analysis and statistical methods, driving him to bridge theory and practice seamlessly. Chirag's dedication to excellence, adaptable mindset, and innate curiosity define him as a dynamic problem solver in the ever-evolving tech landscape.

Aaron Culich

Deputy Director of D-Lab; Cyberinfrastructure Architect and Consulting Lead

Aaron Culich is a staff member at the D-Lab with expertise in Cloud Computing, High Performance Computing (HPC), Databases (SQL and NoSQL), JupyterHub and BinderHub infrastructure, and a variety of programming languages (Python, R, Java, C, C++, and more). His ongoing mission is to explore new compute possibilities, discovering useful tools and practices, and making them more accessible to researchers on campus and beyond.

Thomas Lai

School of Information

I am a Product Engineer passionate about applying engineering, data science, machine learning, and problem-solving principles to improve device performance and solve complex challenges. With experience in statistical analysis, lab bench automation, and Python scripting, I have developed a strong technical skill set that allows me to make meaningful contributions to any project. Beyond my work, I am also passionate about exploring new topics and ideas, from the latest technology trends to how to improve the overall well-being of humans. I enjoy applying the first principle to any...

Introduction to Propensity Score Matching with MatchIt

April 1, 2024
by Alex Ramiller. When working with observational (i.e. non-experimental) data, it is often challenging to establish the existence of causal relationships between interventions and outcomes. Propensity Score Matching (PSM) provides a powerful tool for causal inference with observational data, enabling the creation of comparable groups that allow us to directly measure the impact of an intervention. This blog post introduces MatchIt – a software package that provides all of the necessary tools for conducting Propensity Score Matching in R – and provides step-by-step instructions on how to conduct and evaluate matches.

US Census Bureau Restricted-Access Research Data Center (FSRDC) Info Session

April 24, 2024, 11:00am
Interested in restricted Census or partnering RDC agency (AHRQ, BLS, BEA, NCHS) data use? This one-hour introductory workshop will provide an overview of the Berkeley Federal Statistical Research Data Center, with no prior experience assumed. Attendees will learn about the national RDC network, how to access information online about restricted Census data, and how to navigate proposal development.

Bash + Git: Introduction

May 2, 2024, 2:00pm
This workshop will start by introducing you to navigating your computer’s file system and basic Bash commands to remove the fear of working with the command line and to give you the confidence to use it to increase your productivity. And then working with Git, a powerful tool for keeping track of changes you make to the files in a project.