Data, Prediction and Law is an innovative course  that has been offered by the Berkeley Legal Studies undergraduate program for the past two years. This is one example of the Data Science Education Program’s “data-enabled courses” that build on concepts introduced in Data 8: Foundations of Data Science and other core courses.

Broadly, Data, Prediction, and Law offers a concrete example of how to teach a data science course within a particular domain emphasis. The famous data science Venn diagram defines data science as the intersection of computer science, statistics, and domain expertise. As data science education grows across the country, thinking about how to offer high quality courses that lie in this intersection will be critical. While students will certainly take coursework in each of these three areas, synthesizing them together pays enormous dividends. Seeing how machine learning and statistical inference can be used to directly solve problems in a domain can inspire student creativity, encourage them to work on real-world problems, and spur their interest in a specific field. Whether students go on to industry or graduate education, these are huge benefits.


Offering a data science course in Legal Studies had one other major benefit: being able to integrate discussions about the ethics of data with the practice of data science. Throughout the course, we paired readings about the use of algorithms in government decisionmaking with Python exercises. For instance, we taught students about algorithms used in parole decisions with lessons about developing machine learning models. In doing so, we encouraged students to think about the pros and cons of algorithmic tools, and how to build ethical applications.


Example of Integrating Law and Data Science

To illustrate, our first unit dealt with teaching students the basics of geospatial analysis in Python, and applying those skills to open crime datasets. The technical skills acquired in learning mapping techniques then allows us to illustrate a core concept in law and social science: most social phenomena have strong spatial components. Race, income inequality, education, voting, etc. all spatially distribute in specific ways. Both academic scholarship and the media are paying more attention to how space and place affect people. We want students to gain facility in understanding how to evaluate arguments about how space is interrelated with social phenomena, and develop the basic tools they need to investigate their own questions given a relatively simple dataset. We specifically use Berkeley and San Francisco crime data to illustrate these concepts.


An important aspect of this unit was introducing students to thinking about how these data were collected. We asked students to think about what counts as a “crime” that enters the dataset, and how those categories might correlate with space. To illustrate, our mapping activities showed that most crime incidents reports in Berkeley are concentrated in South Berkeley in the neighborhoods immediately adjacent to UC Berkeley’s campus. These areas generally have denser housing, more students, and more minority communities than the rest of the city. We encouraged students to think about how a data collection process that is mainly concerned with things like violent crime, burglary etc. affects where police may choose to place more of their attention. The key here was to train students to ask critical questions about how datasets are constructed, and not simply assume that data is value-free and neutral by virtue of being numerical.



General Course Reflections

In addition to geospatial analysis and crime, we look at machine learning techniques for predicting crime, and natural language processing techniques for analyzing legal texts and documents. We cover a huge amount of technical material, and integrate it into a broad range of legal applications. The goal is to help students develop the confidence that they can take the Python techniques and critical thinking skills they learned in Data 8, and start to apply them to problems in a specific domain. We leave formal treatment of the more advanced mathematics, computer science, and statistics concepts to other courses in the program, and instead focus on the challenges of dealing with real-world datasets, developing algorithms in an ethical way, and understanding the role that substantive domain expertise plays in becoming an effective data scientist. 



On that note, developing the course materials and administering labs, problem sets, etc. requires a substantial investment in a teaching staff. In addition to the instructor and teaching assistant for this course, we also have substantial support from the Berkeley Data Science Education Program (DSEP). DSEP’s modules teams provided indispensable support in designing the in-class labs, preparing datasets, and generally building out the course infrastructure. Moreover, it is important that the instructional team have expertise in both legal studies and data science. Because the two fields are interwoven so intimately throughout the course, it would be difficult to separate out the legal from the technical concepts. Thus, simply pairing a lawyer and a data scientist with no knowledge of the other domain would likely lead to some issues presenting the material holistically,


Ultimately, we want students to be able thoughtfully use and understand data science in the law. Integrating both fields into one course permitted us to explore sophisticated concepts and critiques. Indeed, in the first iteration of the course, we were constantly impressed by our students’ ability to develop original and deep insights into the growing “law and algorithms” field. Studying both fields in conjunction highlights the potential to do exciting interdisciplinary work in both law and data science, and our hope is that students go on with both the ethical and technical frames that they need to succeed in future endeavors.




Aniket Kesari

Aniket is a PhD student at Berkeley Law's Jurisprudence & Social Policy program. He earned his BA from Rutgers University - New Brunswick in Political Science and History. His research focuses on privacy and cybersecurity law, and he is generally interested in using data science to tackle public policy problems. During his graduate career he participated in the Google Public Policy Fellowship and the Data Science for Social Good (DSSG) Fellowship at the University of Chicago.