Data Science

Conceptual Mirrors: Reflecting on LLMs' Interpretations of Ideas

April 23, 2024
by María Martín López. As large language models begin to engrain themselves in our daily lives we must leverage cognitive psychology to explore the understanding that these algorithms have of our world and the people they interact with. LLMs give us new insights into how conceptual representations are formed given the limitations of data modalities they have access to. Is language enough for these models to conceptualize the world? If so, what conceptualizations do they have of us?

Tactics for Text Mining non-Roman Scripts

April 15, 2024
by Hilary Faxon, Ph.D. & Win Moe. Non-Roman scripts pose particular challenges for text mining. Here, we reflect on a project that used text mining alongside qualitative coding to understand the politicization of online content following Myanmar’s 2021 military coup.

Transparency in Experimental Political Science Research

April 9, 2024
by Kamya Yadav. With the increase in studies with experiments in political science research, there are concerns about research transparency, particularly around reporting results from studies that contradict or do not find evidence for proposed theories (commonly called “null results”). To encourage publication of results with null results, political scientists have turned to pre-registering their experiments, be it online survey experiments or large-scale experiments conducted in the field. What does pre-registration look like and how can it help during data analysis and publication?

Infosession: D-Lab Data Science Fellowship (2024-2025)

April 11, 2024, 3:00pm
The D-Lab is seeking applications for the 2024-2025 cohort of Data Science Fellows. This infosession will give you an in-depth look at the D-Lab Data Science Fellowship and an opportunity for you to ask questions about the program that may be helpful to your application process to become a Fellow!

Python Machine Learning Fundamentals: Parts 1-2

February 21, 2024, 9:00am
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Chirag Manghani

Consultant
School of Information

Chirag is a 2nd year graduate at the I-School. Proficient in Python, Java, R, and SQL, he navigates software application development, machine learning and data science. His keen interest lies in data analysis and statistical methods, driving him to bridge theory and practice seamlessly. Chirag's dedication to excellence, adaptable mindset, and innate curiosity define him as a dynamic problem solver in the ever-evolving tech landscape.

Nicolas Nunez-Sahr

Consultant
Statistics

I lived in Santiago, Chile until I graduated from high school, and then moved to the US for undergrad at Stanford, where I obtained a Bachelor’s degree from the Statistics Department. I then worked as a Data Scientist in an NLP startup that was based in Bend, OR, which analyzed news articles. I love playing soccer, volleyball, table tennis, flute, guitar, latin music, and meeting new people. I want to get better at mountain biking, whitewater kayaking, chess and computer vision. I find nature astounding, and love finding sources of inspiration.

Gaby May Lagunes

Consultant
ESPM

Hello! I’m Gaby (she/her). I am PhD student at the ESPM department, I hold a masters in Data Science and Information from the Berkeley ISchool and I have 5+ years of industrial experience in different data roles. Before that I got a masters in Engineering for International Development and an undergraduate degree in Physics from University College London. And somewhere between all that I got married, survived the pandemic, and had two awesome boys. I’m very excited to help you use data to enhance your work and your experience here at Berkeley!

Thomas Lai

Consultant
School of Information

I am a Product Engineer passionate about applying engineering, data science, machine learning, and problem-solving principles to improve device performance and solve complex challenges. With experience in statistical analysis, lab bench automation, and Python scripting, I have developed a strong technical skill set that allows me to make meaningful contributions to any project. Beyond my work, I am also passionate about exploring new topics and ideas, from the latest technology trends to how to improve the overall well-being of humans. I enjoy applying the first principle to any...

Introduction to Propensity Score Matching with MatchIt

April 1, 2024
by Alex Ramiller. When working with observational (i.e. non-experimental) data, it is often challenging to establish the existence of causal relationships between interventions and outcomes. Propensity Score Matching (PSM) provides a powerful tool for causal inference with observational data, enabling the creation of comparable groups that allow us to directly measure the impact of an intervention. This blog post introduces MatchIt – a software package that provides all of the necessary tools for conducting Propensity Score Matching in R – and provides step-by-step instructions on how to conduct and evaluate matches.