Exploratory Data Analysis in Social Science Research

November 14, 2023

Exploratory Data Analysis in Social Science Research

Political science has taken a turn towards causal inference in the last two decades, evidenced by the focus of methods courses in graduate school and the methodological leanings of publications in top journals of the field. Though understanding the causes of effects and effects of causes is an important enterprise, this trend has, at times, come at the expense of grounding research in good research questions and theory. Finding the right research question and building good theories is a difficult task. A core component of this task is descriptive inference, or the process of describing the world as it exists. Descriptive research can help us establish patterns and puzzles - empirical realities - in the world around us and therefore, craft research questions worth asking. Describing the state of the world can also contribute to building theories to answer those questions.

Often the starting point for descriptive research is exploring existing datasets. This process, which I am calling exploratory data analysis, can be critical in unearthing puzzling empirical patterns, establishing associations between variables, finding predictors of outcomes, and being in conversation with the existing literature on a topic. Consequently, exploratory data analysis also lends itself to a variety of techniques, skills, and methods, such as data cleaning, recoding variables, regression analysis, and of course, machine learning. As a PhD student in the process of proposing my dissertation project, exploring existing datasets has been at the center of my research. My proposed dissertation aims to ask whether there is a gender gap in political ambition for political careers such as elected office, political activism, and leadership in political party organizations, and how women’s political ambition can be increased. I explore these research questions in India.

Exploring the 2022 YouGov-CPR-Mint Data

I conducted exploratory data analysis on survey data collected in India by YouGov-Center for Policy Research-Mint in 2022, which asked citizens questions about their political ambition for a career in politics. Specifically, the survey asked whether individuals would consider making politics their career and if they said no, what the reason was. The survey also collected respondents’ demographic information, opinions on Indian politics and the state of the Indian economy, participation in political activities, and level of satisfaction with their personal freedoms.

Some of the questions I explored through this dataset were:

  • Previous political science research has found a gender gap in political ambition for office (Fox and Lawless 2014, Schneider et al. 2016), that is women are less likely to have considered running for office than men. Does this gender gap in political ambition for office exist in India?

  • What are the reasons for lack of political ambition among individuals and do these reasons differ for men and women?

  • Is the gender gap in ambition particular to political careers or are women in general less ambitious than men?

  • How do politically ambitious women compare to non-politically ambitious women on other indicators of political participation?

  • What are the most important predictors of women’s political ambition?

My exploratory analysis consisted of three key components. First, I cleaned and recoded the data. Second, I created cross-tables of different variables and conducted difference-in-means t-tests. This was to explore whether the differences I observed were significant or purely due to chance. Third, I trained a machine learning model (random forest) to find important predictors of political ambition.

I find that there is a substantial gender gap in political ambition but not an ambition gap writ large. The most important inhibitor of women’s political ambition is that they are not interested in politics as a career and have other interests instead. And that political participation indicators are some of the leading predictors of women’s political ambition. Many of these findings will motivate the proposal for my dissertation.

Data Exploration Results

Political scientists have consistently found that women are less likely to have considered running for elected political office (Fox and Lawless 2014, Schneider et al. 2016). I wanted to know if this pattern existed in India as well. The survey asked respondents if, “Given an opportunity, would you make politics your career?” and respondents could choose to answer yes, no or don’t know/can’t say. Figure 1 below shows the crosstabulation of respondents’ answers by their gender. I found a large gender gap in political ambition – women were more than 8 percent less likely to consider making politics their career than men (Figure 1).

Figure 1: Respondent Political Ambition by Gender

I then conducted a difference-in-means test for the average political ambition by gender – testing whether the average political ambition among men and women differed significantly or purely by chance – and found that the difference was not only large, but also statistically significant as shown from the confidence intervals that are not overlapping (Figure 2).

Figure 2: Difference in Means of Political Ambition by Gender

Next, I wanted to know whether women in India were less ambitious than men in general. Given that India is a patriarchal society, with strong gender hierarchies, it is possible women would express lower desire for any profession outside the household, beyond politics.

The survey asked respondents whether they would want to be businesspeople or entrepreneurs if they had the opportunity. I used this question as a proxy for ambition for an alternative career outside the home. Not only were women more likely to be interested in being businesspeople or entrepreneurs relative to politics, they were also only 3 percent less likely than men to be interested in being businesspeople or entrepreneurs (Figure 3). In other words, the lack of ambition for politics as a career was not a story about lack of ambition at large.

Figure 3: Respondent Entrepreneurial Ambition by Gender

To examine the reasons why some men and women said they do not wish to make politics their career, I created a crosstable of their reasons by gender (Table 1). The most common reason across genders is that respondents were either not interested in politics or they had other career interests and options. As expected, more women than men felt they did not have the requisite skills to be successful politicians. Surprisingly, men and women felt that they didn’t have the personal ties to succeed in politics and that politics is corrupt at similar rates.

Table 1: Crosstable for Lack of Political Ambition by Gender

Lastly, I used a random forest model, trained to predict whether a woman responded they had political ambition, to find the most important predictors of their political ambition. Figure 4 shows a random forest importance plot, which uses the mean decrease in accuracy to capture the importance of a feature on the x-axis. The mean decrease in accuracy tells us the number of observations that would be misclassified if that variable was excluded from the random forest model.

Strikingly, variables capturing an individual’s political participation are the most important predictors of women’s political ambition. This observation is intuitive – women who are more active participants in politics (they vote, protest, attend election meetings and rallies, or volunteer for social causes) would also be more likely to have considered a more active role in politics. Respondents’ area of residence and birth year are also important predictors of political ambition. This would indicate that where an individual lives could influence their political ambition – for instance, states in India (such as Kerala) with more matriarchal norms may have a differential effect on political ambition of women than states with more patriarchal norms. Age can also influence a woman’s political ambition – older women may express lower ambition than younger women. Surprisingly, predictors such as caste or income of the respondent exhibited low importance in predicting political ambition.

Figure 4: Random Forest Importance Plot

Next Steps

This exploratory data analysis has given me ample insight into what political ambition for office could look like in India, why individuals choose not to make politics their career, and predictors of women’s political ambition in the country. In conducting this data analysis, I was able to find evidence, though not causal, that either supported or contradicted existing theories in political science that attempt to explain women’s political ambition or lack thereof. Going forward, my dissertation proposal will use these insights to propose the following research directions:

  • This survey, like others used in political science research, conceptualized political ambition as a career in politics which is akin to asking if one wants to be a politician or run for elected office. This may be a narrow conceptualization of what political ambition means. So I ask, does a gender gap still persist if we conceptualize political ambition more broadly to include everyday forms of politics that are increasingly found in democracies around the world, such as grassroots activism, political non-profit work, and other forms of social mobilization? If so, why does this gender gap in political ambition exist?

  • Given the reasons why certain women do not have political ambition, how do we increase their ambition for various political careers? Can we design interventions, perhaps targeting women who are already ambitious, that encourage them to run for office or become political activists or involve themselves in politics in some way?

Some social scientists once said that good description is better than a bad explanation (King, Keohane, and Verba 2021) - doing careful descriptive research can provide invaluable insight into how the world works and exploratory data analysis is one important way to do this. Social scientists should endeavor to use the rich sources of existing data to motivate and formulate their research questions, ground their theories in reality, and explain phenomena in the world.


  1. Fox, R. L., & Lawless, J. L. (2014). Uncovering the Origins of the Gender Gap in Political Ambition. American Political Science Review, 108(3), 499–519. https://doi.org/10.1017/S0003055414000227
  2. Schneider, M. C., Holman, M. R., Diekman, A. B., & McAndrew, T. (2016). Power, Conflict, and Community: How Gendered Views of Political Power Influence Women’s Political Ambition. Political Psychology, 37(4), 515–531. https://doi.org/10.1111/pops.12268
  3. King, G., Keohane, R. O., & Verba, S. (2021). Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton University Press.