Python

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

Bo Yun Park, Ph.D.

Postdoc
D-Lab

I am a Postdoctoral Scholar in the D-Lab at the University of California, Berkeley. My research lies at the intersection of political, cultural, and transnational sociology. I am particularly interested in dynamics of social inclusion and exclusion, social change, technology, and digital politics. My dissertation investigated how political strategists in France and the United States craft narratives of political leadership for presidential candidates in the digital age. I received my Ph.D. in Sociology at Harvard University, where I was affiliated with the Institute for Quantitative Social...

Avery Richards

Senior Data Science Fellow
School of Public Health

Avery is an MPH graduate at the School of Public Health. With a background in literature and behavioral health, his current research focuses on innovations in applied epidemiology, including multidisciplinary approaches to health and social science data. Avery's general interests include public health surveillance, data quality assurance, and geospatial analysis.

Shivani Patel

IUSE Undergraduate Advisory Board
Cognitive Science
Data Science

Hi! I’m a third-year at UC Berkeley studying Cognitive Science and minoring in Data Science. I will be pursuing a Doctorate of Physical Therapy with an emphasis in Sports Medicine but will be using my Data Science education as a way to enhance the field. I like learning about business models, impacted industries, and approaches to solving major problems in our world/communities.

Kanchana Samala

IUSE Undergraduate Advisory Board
Data Science

Kanchana Samala (she/her) is a third year studying Data Science and pursuing the CalTeach Minor. She is currently a uGSI for Data 8 and facilitates the 'Step Out of Overdrive' Decal about understanding the link between stress and human expression. She is curious and passionate about extracting value from big data in a way that will bring joy into people's lives.

Siddharth Adelkar

Consultant
School of Information

Siddharth Adelkar is a software professional with 15 years of product development experience. This includes award-winning platforms such as People's Archive of Rural India (PARI), which he co-founded in 2014 and where he serves as Tech Editor.

Siddharth is a master's student at the Information School where he studies Information Management and Systems (MIMS). As a TA at the Haas School of Business, he helps teach MBA 290T: SQL programming. Siddharth has a master's degree in Computer Science from the University of Southern California, Los Angeles.

Melanie Phillips

Instructor
Political Science

My name is Melanie L. Phillips. I am currently a PhD Candidate in the Charles and Louise Travers Political Science Department at the University of California, Berkeley. My research examines how women’s political representation in African countries is shaped by the intersection between the rules governing candidate selection and the norms associated with gendered family roles. I use a combination of empirical methods in my work, including survey, experiments, and in-depth fieldwork.

Reubén Pérez

Consultant
Sociology

Reubén Pérez is a Ph.D. student in the Department of Sociology at UC Berkeley, where his research focuses on the politics of ethnoracial data production in the context of Latin America and the Caribbean.

Alex Bruefach

Discovery Graduate Fellow
Materials Science and Engineering

Alex is a PhD Candidate in materials science and engineering developing image processing and machine learning techniques for extracting information from electron microscopy datasets. Her primary focus is understanding what information is transferred from various feature representations of images. She has extensive experience collaborating across boundaries and is passionate about brainstorming innovative approaches to challenging data science problems!

Isaac Sloan

Research Fellow
Data Scholars Program
Cal NERDS

Having experienced the inequities in our public education system, I have a passion for exposing and addressing the barriers low-income minority students face through data science. I am an active supporter of improving our society through data-driven solutions. My technical background in mathematics and data science combined with my applied research skills have allowed me to make an impact on our education system. I am actively looking for opportunities to collaborate with individuals and organizations that are passionate about bridging the intersections of data science.