Forecasting Social Outcomes with Deep Neural Networks

October 7, 2025

Forecasting Social Outcomes with Deep Neural Networks

Deep learning has captured the world’s attention. It is the engine behind many of the last decade’s transformative advances, including Large Language Models (LLMs), biological prediction with AlphaFold, autonomous vehicles, and generative imaging. These developments have understandably resulted in massive interest and investment in deep learning based artificial intelligence (AI), including by those who may not understand the underlying technology or may overestimate its capability. While some of this excitement is uninformed “AI hype,” the mathematical properties of neural networks combined with their demonstrated success across domains indicate that there is something to this excitement, especially for those of us who use data to solve problems and understand the world.

Many social scientists, however, remain skeptical. As a result, computational social science training often reinforces a deep learning gap, instead focusing on tools like decision trees and penalized regression models, which have a stronger track record in the field. However, neglecting deep learning, which has seen wider success across more domains, may limit the creative potential of a new generation of social scientists. 

In this blog post, I provide a tutorial for social scientists curious about, intimidated by, and even skeptical of deep learning. I will walk you through a successful application of a deep neural network, a multi-layer perceptron (MLP), to a social science prediction task: forecasting population-level mortality rates.

Before you begin

This tutorial will use Python, specifically the Keras framework within the TensorFlow package. Neural networks can be trained in R, but the infrastructure is much less developed. If you are new to Python, this book by Jean Mark Gawron, Python for Social Science, can help with installation and other preliminaries [1]. TensorFlow also has several useful learning resources [2]. If you are new to deep neural networks and want to learn more, there are many very helpful resources out there, but I recommend Deep Learning with Python by Francois Chollet as an introduction [3]. 

Data and code for this tutorial are available at this GitHub repo. This data is provided for demonstration purposes only and should not be used for any secondary research. If you are interested in continuing to work with the data, please download directly from the Human Mortality Database as per their data sharing policy [4]. 

The training data has around 360,000 rows and 5 columns. The test data has around 79,000 rows and 5 columns. The first four columns are the features we’ll use to predict mortality and, in order, contain information about country, gender, year, and age. The final column contains mortality rates. Country and gender are represented with integers, but corresponding country codes can be recovered from the geos_key.npy file in the Drive folder. For gender, 0 = female and 1 = male.

Setting up training infrastructure

To start, you’ll want to install and import the following packages.

!pip install tensorflow numpy matplotlib

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt

Additionally, you’ll want to rename the module you’ll use to create model layers to something less verbose.

tfkl = tf.keras.layers

Data preparation functions

The data you downloaded is already “prepared” in the sense that it is split into train and test sets, but it is not yet ready to be passed to the neural network. We’ll create two functions that will finalize preparations. 

The first function, get_data(), will select a row from the data, normalize or transform fields, and return the transformed features (or covariates, predictors) and target (or outcome, dependent variable). 

The second, called prep_data(), will convert the data to an object the model can handle, called a “tensor,” and repeatedly yield batches of 256 rows in the data. These two functions will create a prefetch dataset object, which will not actually carry out the sampling and preparation until the model is training. 

def get_data(index, data, max_val, mode):

    # if we want to select data in order for any reason, we'd switch to "not_random" 

    # but we will use random sampling for training

    if mode == "not_random":

        entry = data[index, :]

    else:

    # Randomly selects index from data between 0 and the max index in training data

        rand_index = tf.random.uniform([], minval=0, maxval=max_val, dtype=tf.int32)

        entry = data[rand_index, :]

    # Normalize or prepare data

    geography, gender, year, age, rate = entry[0], entry[1], entry[2], entry[3], entry[4]

    year = (year - 1959) / 60

    age = tf.cast(age, tf.int32)

    geography = tf.cast(geography, tf.int32)

    gender = tf.cast(gender, tf.int32)

    epsilon = 9e-06    # smallest observed rate in training data

    rate = tf.math.log(tf.maximum(rate, epsilon))

    # Reshape each element to scalar

    features = (tf.reshape(year, [1]), tf.reshape(age, [1]),

               tf.reshape(geography, [1]), tf.reshape(gender, [1]))

    rate = tf.reshape(rate, [1])

    return features, rate

def prep_data(data, mode):

    # convert data to tensor

    data = tf.convert_to_tensor(data)

    # ensure values are in float32 format

    data = tf.cast(data, tf.float32)

    # get total number of samples for sampling

    max_val = data.shape[0]

    # create dataset with values from 0 to 9999

    dataset = tf.data.Dataset.from_tensor_slices(np.arange(10000))

    # repeat dataset indefinitely if training

    if mode == "train":

        dataset = dataset.repeat()

    # repeat 120 times if not training

    else:

        dataset = dataset.repeat(120)

    # for each sample, feed through get data 

    dataset = dataset.map(

        lambda x: get_data(x, data, max_val=max_val), 

                          num_parallel_calls=4)

    # collect 256 prepared samples into a batch

    dataset = dataset.batch(256)

    # prefetch to improve performance

    final_data = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)

    return final_data

Model training functions

The network represented below has five intermediate or “hidden” layers, each with 128 neurons. Deep learning models have many hyperparameters, settings or configuration choices that control how the model is trained, that can be adjusted and tuned to improve model performance. In other words, modelers often have many decisions to make, although there are rules of thumb and best practices that can make some of these choices easier [3]. The code below shows one way of setting up our model, including some deep learning best practices, but you can also alter these decisions and see how performance is affected. Some of the key model elements/decisions to be aware of are:

  • Setting up categorical and continuous variables

    • year - a float32 numeric input

    • age_embed, gender_embed, and geography_embed - our categorical features (we choose to treat age as categorical), whose values are converted to 5D vectors

  • The transitions and transformations between layers

    • We choose ‘tanh’ activation functions. Another common activation function is ‘relu’. These allow deep learning models to capture non-linear dynamics.

    • BatchNormalization() layers adjust layer outputs back to reasonable values for the model.

    • Dropout() layers prevent the model from fitting too closely to training data, helping it to generalize to unseen data

  • The optimization specifications

    • We choose the mean-squared error as the loss value to be optimized.

    • We choose Adam as the optimizer (a common choice for many models).

For the sake of this tutorial, don’t worry too much about the details. Instead, I would focus on getting familiar with how networks are broadly set up in Keras and the general types of decisions you’ll need to make.

def create_model(geo_dim):

    # defining inputs 

    year = tfkl.Input(shape=(1,), dtype='float32', name='Year')

    age =  tfkl.Input(shape=(1,), dtype='int32', name='Age')

    geography = tfkl.Input(shape=(1,), dtype='int32', name='Geography')

    gender = tfkl.Input(shape=(1,), dtype='int32', name='Gender')


    # defining embedding layers 

    age_embed = tfkl.Embedding(input_dim=100, output_dim=5, name='Age_embed')(age)

    age_embed = tfkl.Flatten()(age_embed)


    gender_embed = tfkl.Embedding(input_dim=2, output_dim=5, name='Gender_embed')(gender)

    gender_embed = tfkl.Flatten()(gender_embed)


    geography_embed = tfkl.Embedding(input_dim=geo_dim, output_dim=5,       

                                     name='Geography_embed')(geography)

    geography_embed = tfkl.Flatten()(geography_embed)


    # create feature vector that concatenates all inputs 

    x = tfkl.Concatenate()([year, age_embed, gender_embed, geography_embed])

    x1 = x


    # setting up hidden layers 

    x = tfkl.Dense(128, activation='tanh')(x)

    x = tfkl.BatchNormalization()(x)

    x = tfkl.Dropout(0.05)(x)


    x = tfkl.Dense(128, activation='tanh')(x)

    x = tfkl.BatchNormalization()(x)

    x = tfkl.Dropout(0.05)(x)


    x = tfkl.Dense(128, activation='tanh')(x)

    x = tfkl.BatchNormalization()(x)

    x = tfkl.Dropout(0.05)(x)


    x = tfkl.Dense(128, activation='tanh')(x)

    x = tfkl.BatchNormalization()(x)

    x = tfkl.Dropout(0.05)(x)


    x = tfkl.Concatenate()([x1, x])

    x = tfkl.Dense(128, activation='tanh')(x)

    x = tfkl.BatchNormalization()(x)

    x = tfkl.Dropout(0.05)(x)

    x = tfkl.Dense(1, name='final')(x)


    # creating the model 

    model = tf.keras.Model(inputs=[year, age, geography, gender], outputs=[x])


    # compiling the model

    model.compile(loss='mse', optimizer='adam')


    return model

Finally, we set up a function that will execute our training process. We are running the create_model() function, which compiles our model and prepares it to take in batches of data. The model.fit() function takes in both our prepared training data and our test data. This allows us to get information during training about how our model is fitting on our training data, but also how it’s fitting on unseen data. This can clue us in to potential overfitting if our reported training MSE is much lower than our test or validation MSE. The verbose=2 argument in model.fit() allows us to watch the training process.

def run_deep_model(dataset_train, dataset_test, geo_dim, epochs):


    model = create_model(geo_dim)


    callbacks = [tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.25, 

                                                      patience=3, verbose=0, mode="auto", 

                                                      min_delta=1e-8, cooldown=0,        

                                                      min_lr=0.0)]

    history = model.fit(dataset_train, validation_data=dataset_test, validation_steps=25, 

                        steps_per_epoch=1400, 

                        epochs=epochs, verbose=2, callbacks=callbacks)


    tf.keras.backend.clear_session()


    return model 

Training the model

Train models and observe progress

Now that we have the backbone of our code written, we get to do the fun part: watch our model learn in real time! We’ll start by loading in and preparing our data. 

# load data (adjust path as needed)

training = np.loadtxt('hmd_training.txt')

test = np.loadtxt('hmd_test.txt')


# prep data

train_prepped = prep_data(training, mode="train")

test_prepped = prep_data(test, mode="test")


# get value for the geography dimension

geo_dim = int(max(training[:,0]) + 1)

And we run our model.

trained_model = run_deep_model(train_prepped, test_prepped, geo_dim,

                               epochs = 20)

What you should see printed as the model trains is an evaluation of performance after each of our 20 epochs. An epoch represents one complete pass through all of the data, where the optimization algorithm, called backpropagation, is carried out on each batch of 256 rows until all of the data has been sampled and the model parameters are updated accordingly. In other words, after one epoch, the model will have seen all of the data and adjusted its parameters in response [5].

After a few seconds, we will see the results from the first epoch. This includes information about the training loss (mean-squared error), the validation loss (mean-squared error on test data), and the learning rate, which affects the size of adjustments the model makes to its parameters on each optimization step. We want to pay attention to the trajectory of loss values as the model does its 19 other passes through the data. Ideally, both the training and validation loss values will decrease with each epoch. Your output should look something like the following.

Epoch 1/20

1400/1400 - 10s - 7ms/step - loss: 2.0438 - val_loss: 0.2565 - learning_rate: 0.0010

Epoch 2/20

1400/1400 - 8s - 6ms/step - loss: 0.3340 - val_loss: 0.2771 - learning_rate: 0.0010

Epoch 3/20

1400/1400 - 8s - 6ms/step - loss: 0.2561 - val_loss: 0.2119 - learning_rate: 0.0010

Epoch 4/20

1400/1400 - 8s - 6ms/step - loss: 0.2170 - val_loss: 0.1962 - learning_rate: 0.0010

Epoch 5/20

1400/1400 - 8s - 6ms/step - loss: 0.2002 - val_loss: 0.2625 - learning_rate: 0.0010

Epoch 19/20

1400/1400 - 9s - 6ms/step - loss: 0.1561 - val_loss: 0.1843 - learning_rate: 1.5625e-05

Epoch 20/20

1400/1400 - 9s - 7ms/step - loss: 0.1582 - val_loss: 0.1447 - learning_rate: 1.5625e-05

This training trajectory is a little noisy–loss values do not descend every epoch–but that’s pretty normal. Overall, the trend looks strong enough that this variation isn’t concerning. There’s also little evidence of overfitting. Stopping a bit earlier might have been beneficial, since improvements taper off toward the end of training. Still, the progress overall looks excellent.

Plot results

Let’s now plot the predictions from our model against actual mortality rates for one of the countries in our dataset. We’ll start by quickly preparing input features to be fed into the model for prediction. 

test_input_features = (tf.convert_to_tensor((country_test[:,2] - 1959) / 60,   

                       dtype=tf.float32),  # Year

                       tf.convert_to_tensor(country_test[:,3], dtype=tf.float32),  # Age

                       tf.convert_to_tensor(country_test[:,0], dtype=tf.float32),  # Geo

                       tf.convert_to_tensor(country_test[:,1], dtype=tf.float32))  # Sex


test_predictions = trained_model.predict(test_input_features)

inputs_test = np.delete(test, 4, axis=1)

test_predictions = np.column_stack((inputs_test, test_predictions))

Then, let’s filter to one country, one gender, and one year and plot mortality rate predictions across age. We’ll plot for Australian males in 2015.

preds_filtered = test_predictions[(test_predictions[:,0] == 50) & # Australia

                                   (test_predictions[:,1] == 1) &  # Males

                                   (test_predictions[:,2] == 2015)]

test_filtered = test[(test[:,0] == 50) & # Australia

                      (test[:,1] == 1) &  # Males

                      (test[:,2] == 2015)] 


plt.figure(figsize=(8, 6))


plt.plot(test_filtered[:,3], test_filtered[:,4], color='blue', label='Actual Rates',  

         marker='o')

plt.plot(preds_filtered[:,3], np.exp(preds_filtered[:,4]), color='red', 

         label='Predicted Rates', marker='x')

plt.xlabel('Age')

plt.ylabel('Mortality Rate')

plt.title('Actual vs Predicted Mortality Rates for 2015 Australian Males')

plt.legend()

plt.show()

The results look pretty good – especially considering we are nine years out from the training data. The forecasted rates are pretty close to the observed rates in 2015, though they do struggle more to predict higher rates at old ages.

A graph titled "Actual vs Predicted Mortality Rates for 2015 Australian Males", x-axis labeled "Age", 0 to 100, y-axis labeled "Mortality Rate", 0.0 to 0.4, and a red line indicating "Predicted Rates" closely matching a blue line indicating "Actual Rates"

Conclusion

In this tutorial, we prepared input data for a deep neural network, set up the network, and trained a model to forecast age-specific mortality rates for many populations. We learned how to interpret the outputs of the training process and observed, both with descending error values during training and our visualization, that the model successfully learned to predict unseen mortality data. 

Aggregated mortality rates are unique social outcomes in that they tend to be easier to predict than noisier outcomes like migration or unemployment rates. We might expect a deep learning model to do well since there are much simpler time series models that also do a decent job of predicting mortality [6]. 

However, deep neural networks also have the capacity to model more complicated social processes and better predict outcomes than traditional statistical models for tasks beyond mortality. There is emerging research using deep learning and AI to both generate information-rich model inputs and to directly predict complex social behaviors, though it seems that much of this work is happening outside of social science departments [7,8]. This tutorial aimed to introduce social scientists to the process of training a deep neural network, with the hope that greater familiarity with deep learning, AI, and machine learning will empower us to contribute meaningfully to the changing conversation around social prediction.

References

  1. Python for Social Science

  2. TensorFlow

  3. Deep Learning with Python

  4. Human Mortality Database

  5. In this case, it actually is not entirely true that the model will see ALL of the samples in each epoch since we are sampling the data with replacement, but the vast, vast majority of the data will be seen by the model. 

  6. Though it is also surprising that deep networks do better than classic approaches for this problem given that the mortality time series are relatively short and the complexity here arises from interactions between only four features. For more on why the deep learning approach does better for this problem, check out my preprint

  7. Google Population Dynamics Foundation Model

  8. For example, faculty affiliates of the Berkeley Artificial Intelligence Research (BAIR) Lab, most of whom are engineering or computer science professors, are doing some very interesting work on social prediction with deep learning tools.