Can Machine Learning Models Predict Reality TV Winners? The Case of Survivor

March 14, 2023

Since its creation in 2000, the show Survivor has dropped approximately 20 contestants in a remote location and forced them to hunt for their own food and build their own shelter. While a test of physical endurance, the social dynamics and strategy are what make the show stand out from other reality television shows. In this post, I lay out the rules of the show and explore ways to leverage existing data to predict how far players will make it into the game.

The R and Python files used to clean and analyze data are available on my GitHub page

Game Overview

At the beginning of each season, contestants are split into two teams (‘tribes’) which then compete against each other in a series of physically and mentally grueling challenges. In the first half of the season, the team that loses the challenge in each episode must go to ‘tribal council’ and vote out one member of their team (the winning tribe gets ‘immunity’ from the vote in that episode). Each contestant gets one vote, and players often form alliances and deceive each other to make the vote go their way, which often results in some dramatic blindsides.

About halfway through the game, the two tribes (who at this point have about 10 players between them) merge. Contestants then compete individually in challenges for immunity, and continue to vote to eliminate one player at the end of each episode. After the merge, those who are voted out join the ‘jury,’ who then ultimately vote on the winner—the title of Sole Survivor— when there are two or three contestants left. Being able to make it to the merge and with it, the jury, is analogous to making the playoffs and an indicator of modest success on the show.

The Prediction Problem

Survivor has attracted a dedicated online community of fans eager to debrief after each episode, discuss gameplay and strategies, and even develop schema to predict the likely winner. Using what’s called ‘edgic’ (a portmanteau of ‘editing’ and ‘logic’), fans assess each player’s odds of winning coming out of each episode by identifying recurring character arcs and portrayals. Fans use three measures to study this: rating (out of five types, including whether they came across as being over the top or having a complex personality), tone (e.g., extremely positive, mixed), and visibility (i.e., the amount of screentime). However, while clever in theory, these sorts of predictions tend to be somewhat inconsistent, especially in predicting the winner.

Of course, one issue with this method is that the predictors are based on fans’ perceptions distilled into crude categories, which they then use to make an inference about show outcomes. Additionally, other factors like gender and race may impact both how contestants are portrayed and their odds of advancing in the game, which makes the inclusion of demographic data about players especially valuable. Inspired by this tradition, I trained a supervised binary logistic classifier in Python to predict whether a player made the merge (i.e., made it at least halfway into the game). Ultimately, the model seeks to combine some of the objective components of edgic such as player visibility with other measurable characteristics of contestants. I detail this process in the following sections.

Features

For this analysis, I compiled a season/person-level data set based on Dan Oehm’s survivoR GitHub repository, which contains information on gender, age, the number of confessionals per episode, and advantages won in the game. The repository contains multiple files at the person-, season-, and episode- levels. To avoid post-merge data contaminating the training data, I calculated the average number of confessionals and advantages won in pre-merge episodes. Typically, players get between two and five confessionals per episode, with some getting as many as eleven. I also included data on the version of the show (there are adaptations in Australia, New Zealand, and South Africa), as these are edited by different production teams.

The final data set contains 1,159 observations across 60 Survivor seasons, with some players appearing multiple times if they competed in multiple seasons. Among these, 652 players (about 56 percent) made the merge.

Model and Results

Using Python’s scikit-learn library, I trained a logistic regression classifier tuned with lasso, ridge, and elastic net regularization penalties and five-fold cross validation to predict the likelihood a player would make the merge based on the features described above. The model did not do well at all out of sample, with an accuracy worse than even a coin flip, suggesting that these crude measures tell us little about game outcomes. As seen in the confusion matrix below, the model correctly predicted that a player wouldn’t make the merge 38 percent of the time (early boot); it correctly predicted that the player would make the merge just over half the time.

Takeaways

While the model did poorly, there are many other features that could improve it, such as the number of times a contestant was mentioned by other players, alliances and connections, how many challenges their tribe won, tribe composition, the duration of confessionals, and other key demographic data like race. These models also flatten the data to one player per season, rather than per episode. Accordingly, it doesn’t take into consideration the sequential nature of the game, wherein contestants’ odds of winning are changing after each episode. It’s still possible that the fans are actually onto something with edgic because I don’t include qualitative data about contestants’ portrayal and story arc.