Python

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

Bo Yun Park, Ph.D.

Postdoc
D-Lab

I am a Postdoctoral Scholar in the D-Lab at the University of California, Berkeley. My research lies at the intersection of political, cultural, and transnational sociology. I am particularly interested in dynamics of social inclusion and exclusion, social change, technology, and digital politics. My dissertation investigated how political strategists in France and the United States craft narratives of political leadership for presidential candidates in the digital age. I received my Ph.D. in Sociology at Harvard University, where I was affiliated with the Institute for Quantitative Social...

Avery Richards

Senior Data Science Fellow
School of Public Health

Avery is an MPH graduate at the School of Public Health. With a background in literature and behavioral health, his current research focuses on innovations in applied epidemiology, including multidisciplinary approaches to health and social science data. Avery's general interests include public health surveillance, data quality assurance, and geospatial analysis.

Shivani Patel

IUSE Undergraduate Advisory Board
Cognitive Science
Data Science

Hi! I’m a third-year at UC Berkeley studying Cognitive Science and minoring in Data Science. I will be pursuing a Doctorate of Physical Therapy with an emphasis in Sports Medicine but will be using my Data Science education as a way to enhance the field. I like learning about business models, impacted industries, and approaches to solving major problems in our world/communities.

Kanchana Samala

IUSE Undergraduate Advisory Board
Data Science

Kanchana Samala (she/her) is a third year studying Data Science and pursuing the CalTeach Minor. She is currently a uGSI for Data 8 and facilitates the 'Step Out of Overdrive' Decal about understanding the link between stress and human expression. She is curious and passionate about extracting value from big data in a way that will bring joy into people's lives.

Siddharth Adelkar

Consultant
School of Information

Siddharth Adelkar is a software professional with 15 years of product development experience. This includes award-winning platforms such as People's Archive of Rural India (PARI), which he co-founded in 2014 and where he serves as Tech Editor.

Siddharth is a master's student at the Information School where he studies Information Management and Systems (MIMS). As a TA at the Haas School of Business, he helps teach MBA 290T: SQL programming. Siddharth has a master's degree in Computer Science from the University of Southern California, Los Angeles.

Reubén Pérez

Consultant
Sociology

Reubén Pérez is a Ph.D. student in the Department of Sociology at UC Berkeley, where his research focuses on the politics of ethnoracial data production in the context of Latin America and the Caribbean.

Melanie Phillips

Instructor
Political Science

My name is Melanie L. Phillips. I am currently a PhD Candidate in the Charles and Louise Travers Political Science Department at the University of California, Berkeley. My research examines how women’s political representation in African countries is shaped by the intersection between the rules governing candidate selection and the norms associated with gendered family roles. I use a combination of empirical methods in my work, including survey, experiments, and in-depth fieldwork.

Louie Ortiz

IUSE Undergraduate Advisory Board
Data Science

Louie is a third-year transfer student majoring in Data Science with an emphasis on Cognition. He hopes to analyze how data—both at the computational and human level—can advance our understanding of technology and its socio-cultural implications. He is a part of the IUSE Undergraduate Advisory Board; helping make Data Science at Berkeley inclusive and accessible to all.

Abhishek Roy

IUSE Undergraduate Advisory Board
Economics
Data Science

I'm Abhishek Roy and I'm double majoring in Economics and Data Science. I've been a part of D-Lab's IUSE project since Spring 2020 and have truly found an organization that is not only passionate about Data Science but also strives to expand its reach equitably to all communities. I am involved in Research and Project Management roles in various departments and labs at Berkeley and I'm an Editor at the Berkeley Economic Review. I love diving into anything at the intersection of Data Science, Economics, Business, and Computational Social Science. Whenever I'm free, I love writing...