A Basic Introduction to Hierarchical Linear Modeling

March 4, 2024

A Basic Introduction to Hierarchical Linear Modeling

The linear regression model stands as one of the most widely used statistical tools in both research and practical applications. Its simplicity and intuitive nature make it popular in statistical analysis. The basic form of linear regression is expressed as: 

Linear regression assumes that there's a straight-line relationship between the independent variables x (like study time in hours per day) and the outcome variable y (like GPA). The intercept (α) represents the averaged y when x = 0 (like the average GPA for students who don’t study at all). The coefficient (β) in the equation represents how much the outcome changes for each unit change in the independent variable. So, if you have a coefficient of 0.1 for study time per day, it means that for every study hour per day, the GPA increases by 0.1 unit. It's a straightforward way of understanding how changes in one variable affect another. Typically, more emphasis is placed on β as it signifies the relationship between x and y. The last component of the equation, ϵ, represents the error term—indicating the unexplained portion of the dependent variable. It is assumed to follow a normal distribution (ε ~ N(0, τ2)). Many assumptions about linear regression revolve around this error term.

While simple linear regression is valuable, there are scenarios where a more nuanced approach is required to reflect the underlying structure of the data and variables of interest. One such approach is the hierarchical linear model (HLM), also known as multilevel linear models or mixed effects models.

Rationales for Hierarchical Linear Modeling

First, it is common to find that our data are clustered at a higher level. For instance, in a study examining the relationship between students’ ability and mathematical achievement, data collected from multiple schools may reveal that students within the same school tend to demonstrate similar mathematical achievement. The clusters introduce additional associations. Linear regression fails to account for these clustering properties, necessitating the use of HLM to address such hierarchical data structures, where lower units are nested within higher units (aka clusters) causing additional associations among lower units. We often use levels to denote this cluster design, where a lower level is nested within a higher level, e.g., students are at level 1, which are clustered within level 2.

Figure 1. Students are nested within schools

Furthermore, while linear regression typically involves independent and dependent variables at the same level, there are cases where multiple-level independent variables are of interest. For example, in a study on mathematical achievement, school-level variables such as student-teacher ratios or school policies may play a crucial role as predictive features. Combined with student-level variables such as intelligence, multiple-level variables are involved. HLM efficiently handles multilevel variables, offering a compelling rationale for its adoption.

Hierarchical Linear Models

In contrast to the conventional regression model, HLM introduces additional notation to account for hierarchical structures within the data. We use i to index individual cases and j to denote clusters, with xij and yij representing the independent and dependent variables for the ith case within the jth cluster, respectively.

One distinctive feature of HLM is its allowance for cluster-specific intercepts and slopes through the incorporation of random effects. In this blog, we will focus primarily on random intercept models.

Random Intercept Model (RIM)

The random intercept model is represented as:

Compared with the first equation, the random intercept model eliminates independent variables and decomposes the intercept into two parts: (1) α0 is the “grand mean” of y across all clusters and samples, (2) uj denotes how each cluster deviate from the grand mean and is normally distributed (i.e., uj ~ N(0, σ2)). In HLM, we focus more on the variance of uj rather than a specific value of it. Specifically, when interpreting the random intercept model, we are more interested in the spread of the outcome across clusters, rather than the deviation of a particular cluster. With level 1 denoting the lowest level (e.g. student-level) and level 2 denoting the cluster level (e.g., school-level), the HLM equation can also be written in a hierarchical manner:

For the random intercept model, an important metric to monitor is the Intraclass Correlation Coefficient (ICC), which takes the form of:

Recall that at the beginning of this blog, τ2 is the variance of the error term. The ICC quantifies how similar two random samples from the same cluster are. It falls within 0 to 1 with a higher ICC indicating more similarity within clusters, thereby justifying the use of HLM.

RIM with level-1 independent variables

Level-1 independent variables, such as students' ability in the mathematical achievement example, are those that vary at the first level. Incorporating these variables into HLM is relatively straightforward. We simply add them to the level-1 equation while keeping the level-2 equation unchanged.

RIM with level-2 independent variables

On the other hand, level-2 independent variables, like student-teacher ratios in the mathematical achievement study, remain constant at level-1 but vary across schools at level-2. Adding these variables to the model involves augmenting the level-2 equation while maintaining the structure of the level-1 equation.

Note that since level-2 variables only vary at the cluster level, i, the index of cases, is dropped from level 2.

An Example

To illustrate the application of HLM, let's consider a scenario involving mathematical achievement. We'll examine the influence of possession of a study desk at the student level and student-teacher ratio at the school level on mathematical achievement, with economic, social, and cultural status (ESCS) serving as a control variable. We'll analyze a dataset from the Programme for International Student Assessment (PISA) using the lme4 package in R. Descriptive statistics are given in Table 1.

Table 1. Descriptive statistics of the exemplar dataset

Building a Random Intercept Model

Once the necessary packages and data have been loaded, we can employ the lmer function to execute the Random Intercept Model (RIM). The setup is straightforward: the dependent variable, math, is positioned on the left side of ~, while the right side contains intercepts and independent variables. The first 1 on the right side of ~ denotes the grand mean of Math (i.e., α0). Following this, (1|school_id) signifies the random intercepts varying at the school level (i.e., uj).





# Load data

student_2018 <- load_student("2018")

student_usa_2018 <- student_2018 |> filter(country=='USA')


dt_school <- school |> filter(country=='USA' & year==2018)

dt_usa <- merge(student_usa_2018, dt_school, by='school_id') |>

  select(school_id,math,desk,stratio) |>


# Run the random intercept model

rim0 <- lmer(math ~ 1 + (1|school_id), data=dt_usa)


Figure 2 below shows the results of the RIM. The Random effects section shows the variance and standard deviation of the random intercept and residuals. We can use the variance estimates to calculate ICC: 1524/(1524+6608) ≈ 0.19. It means that 19% of the variance is attributed to the differences among schools. Moving to the Fixed Effects section, we observe that the estimated grand mean (α0) is 477.14, which approximates the mean of all sampled students.

Figure 2. Results of RIM

RIM with level-1 independent variables

To incorporate Level-1 independent variables, we add them to the right-hand side of ~ in the formula:

rim1 <- lmer(math ~ 1 + desk + escs + (1|school_id), data=dt_usa)


Upon executing this model (rim1), we can interpret the results displayed in the summary output.

 In the Fixed Effects section, the coefficient associated with the desk variable represents the effect of possessing a study desk on students' mathematical achievement, controlling for escs. The estimated coefficient value, in this case, is 8.77 units. Furthermore, the standard errors and t-value provide insights into the significance of this predictor. Based on these results, we interpret that the possession of a study desk is a significant predictor of students' mathematical achievement, with an increase of 8.77 units on average, after accounting for differences in economic, social, and cultural status.

Figure 3. Results of RIM with level-1 independent variables

RIM with level-1 and level-2 independent variables

To incorporate level-2 independent variables, such as the student-teacher ratio, into the model we add them on the right-hand side of ~. In lme4, level 2 variables are treated in the same way as level 1. For instance:

rim2 <- lmer(math ~ 1 + desk + escs + stratio + (1|school_id), data=dt_usa)


Upon running this model (rim2), we can interpret the results displayed in the summary output.

In the Fixed Effects section, the coefficient associated with the stratio variable represents the effect of the student-teacher ratio on students' mathematical achievement. In this specific example, a unit increase in the student-teacher ratio is estimated to cause a 0.55-unit increase in mathematical achievement, on average. This effect is observed after controlling for the effects of possessing a study desk and economic, social, and cultural status.

Figure 4. Results of RIM with level-1 and level-2 independent variables


For readers seeking further insights into HLM, two recommended books are:

  1. Multilevel Modeling Using R by Finch et al. (2019): A beginner-friendly guide offering practical instructions on conducting HLM in R and interpreting results.
  2. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling by Snijders and Bosker (2012): Provides a comprehensive introduction to HLM, covering advanced topics not addressed in the former book. Ideal for readers aiming to deepen their understanding of HLM.


  1. Snijders T. A. B., & Bosker R. J.(2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.
  2. Finch, W. H., Bolin, J. E., & Kelley, K. (2019). Multilevel modeling using R. Crc Press.
  3. Siegler, R. S., Duncan, G. J., Davis-Kean, P. E., Duckworth, K., Claessens, A., Engel, M., ... & Chen, M. (2012). Early predictors of high school mathematics achievement. Psychological science, 23(7), 691-697.