Computational Social Science in a Social World: Challenges and Opportunities

March 26, 2024

Computational Social Science in a Social World: Challenges and Opportunities

Social scientists study a wide range of social phenomena, such as the financialization of markets, the economics of family formation, how education shapes life chances, and how social networks influence collective health behaviors. Sociologists understand that new technologies can socially, culturally, and economically shape entire societies. The Industrial Revolution introduced technologies that dramatically expanded trade in markets, changed settlement patterns of families, and created enormous wealth. These had positive and negative consequences such as agricultural improvements, the rise of new business ventures, and growing inequality. This historical context sets the stage for understanding how today’s digital revolution, driven by advancements in Artificial intelligence (AI), Machine Learning (ML), and Data Science, presents new opportunities and challenges for studying social life.

AI, ML, and Data Science can fundamentally transform society. Just as the steamboat facilitated trade and the Gutenberg press–in its mass-producing of print materials–revolutionized literacy, AI and ML are poised to redefine social life and how we study it. To navigate the digital age’s effect on social science research (Salganik, 2017), a paradigmatic shift in the social sciences will be necessary.

This paradigmatic shift means leveraging new tools to study complex social phenomena, understanding how these technologies shape social life, and a reevaluation of existing social science research (Hindman, 2015). Social scientists will need to learn the mathematical and logical intuition that undergirds computational methods, especially when considering that working with these tools introduces a host of new methodological and ethical challenges (Leitgöb, Prandner & Wolbring, 2023). Sociologists must continue to grapple with issues of bias, representativeness, and sampling in the new realm of Computational Social Science (CSS) seriously if we wish to be part of the conversation. Staying up to speed will require collaboration with existing technologists, data scientists, and machine learning experts.

From OLS to Machine Learning: Enhancing Methodological Approaches

Given this paradigmatic shift, we should question whether our existing methodologies serve just as well at capturing complex social phenomena. For example, Ordinary Least Squares (OLS) is the standard toolkit for quantitative social science and has limitations in its capacity to model nonlinear relationships without needing to specify the form of those relationships a priori (James et al., 2013). Historically, OLS has been the foundational model used to explain the relationship between inputs (X) and outputs (Y). Traditional tools such as OLS have been met with various methodological critiques about their ability to measure human behavior accurately and precisely. These critiques make sense, given the decades-long debates in sociology about the merits of different kinds of methodologies (e.g., quantitative versus qualitative) and their abilities to capture and represent the behaviors of social actors.

With ML, this standard toolkit is used to predict instead of explain the data at hand. OLS can be used as an ML model where the emphasis is on prediction to new, unseen data (Hastie, Tibshirani, & Friedman, 2013). The limitations of OLS underscore the potential of ML to handle increasingly complex data and research questions. ML seeks to find patterns from data and improves upon OLS as it works well with nonlinear relationships, and is designed to handle data with many dimensions. The most common goal in ML is to answer the question, “Is variable X (or variables) associated with variable Y, and if so, what is the relationship, and can we use it to predict Y?” In a supervised machine-learning context, we use training data of paired input (X) and output values (Y) to predict Y from X on new, unseen data. This method is often framed as a magical, Terminator-3-esque set of tools that surpass human intelligence when, in reality, we are simply using prediction based on existing training and testing datasets to predict some outcome (Y), such as the probability of the next word in a sentence based on a corpus of text or the likelihood of fertility rising as a function of a vector of independent variables (or features as they are called in ML parlance).

We should consider how such tools can enhance our existing methodologies for studying issues social scientists are interested in and how to update our methodological approaches. ML techniques are being used to extract meaningful measures from unconventional data sources such as text and images. For example, Kozlowski, Taddy, & Evans (2019) apply neural networks and word embedding models to understand the cultural shifts in the meaning of social class (and its relationship to education) over the course of the 20th century. This work demonstrates the utility of such tools in advancing and enabling us to analyze complex cultural phenomena in a more nuanced manner.

These new techniques are also allowing us to capture and improve our estimates of effect heterogeneity and causal inference. The work of Daoud & Johansson (2024) assesses the heterogeneous effects of economic austerity following the implementation of the International Monetary Fund (IMF) programs, using general randomized forests to estimate the conditional average effect of IMF austerity on childhood poverty. We can also use these techniques to ask new questions from existing datasets. The work of Mittleman (2022) uses supervised machine learning to uncover inequality obscured by traditional measures of sex and gender to highlight how sexuality is an important dimension of educational inequality. We can learn from these new pieces about novel ways to study culture, inequality, and society. Of course, these methods are not a panacea for all existing social ills but we can use these methods to capture the complexity of social life and improve existing methodologies.

Computational Social Science and the Role of AI in Society

In addition to reflecting on the opportunities for sociological research by implementing these methods, there is also the opportunity to study the effects of these new technologies on society itself. For example, underserved populations could look to Large Language Models (LLMs) as starting points for helping to respond against being wrongfully evicted by eliciting the help of LLMs to help them write a letter to their landlord. Similarly, someone wishing to start a new career opportunity but needing to know exactly what to start might benefit from asking an LLM about the process, requirements, and suggested timeline to completion. These approaches will not cure social inequality, but they will help us mitigate the risks that come with these new technologies.

As we reflect on the advancements and applications of AI, ML, and Data Science, we should think very seriously about the role of these technologies in understanding and shaping social policy. The frontier of civil rights in society is related to artificial intelligence, data science, and machine learning. Among the risks that are introduced with these new technologies are challenges to truth and science.

Social scientists should collaborate with technologists, data scientists, and legal scholars to understand how we can be part of the conversation on AI regulations, enhancing the opportunities while minimizing the risks that stem from using it. AI is changing rather rapidly – what is the role of government, researchers, and civil society in response and in collaboration with these technologies? If social scientists want to be part of this conversation, we need to be proactive about learning how these methods work both mathematically and sociologically.

If we are concerned with issues of inequality and AI, we should know exactly how this technology is exacerbating inequality. Knowing this will require retraining researchers in the social sciences to learn the mathematical circuitry of such technologies so we can learn more about the social circuitry of their effects and their effects on the social circuity of our world. Engaging with AI, ML, and Data Science enables us to seize new avenues of research but also places us at the forefront of the risks and challenges of working with such technologies. These technologies are not a panacea for existing social ills but they can help us mitigate them by being involved in the conversation.


  1. Crafts, N. F. R., & Harley, C. K. (1992). Output Growth and the British Industrial Revolution: A Restatement of the Crafts-Harley View. The Economic History Review, 45(4), 703–730.

  2. Daoud, A., & Johansson, F. D. (2024). The impact of austerity on children: Uncovering effect heterogeneity by political, economic, and family factors in low- and middle-income countries. Social science research, 118, 102973.

  3. Harley, C. K. (1982). British Industrialization Before 1841: Evidence of Slower Growth During the Industrial Revolution. The Journal of Economic History, 42(2), 267–289. doi:10.1017/S0022050700027431

  4. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R. Springer.

  5. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review, 84(5), 905-949.

  6. Leitgöb, H., Prandner, D., & Wolbring, T. (2023). Big data and machine learning in sociology. Frontiers in Sociology, 8, 1173155.

  7. Mittleman, J. (2022). Intersecting the Academic Gender Gap: The Education of Lesbian, Gay, and Bisexual America. American Sociological Review, 87(2), 303-335.

  8. Molina, M., & Garip, F. (2019). Machine learning for sociology. Annual Review of Sociology, 45, 27-45.

  9. Salganik, Matthew J. 2017. Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press. Open review edition.

  10. Hindman, M. (2015). Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences. The Annals of the American Academy of Political and Social Science, 659(1), 48–62.