What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024

What Are Vowels Made Of? Graphing a Classic Dataset with R

Vowels are all around us. Depending on how you speak, there are between six and seven in the first sentence of this paragraph alone. When linguists talk about vowels, we are not referring to spelling– in English, a, e, i, o, u, and sometimes y. Rather, we are referring to the actual speech sound, or sound signal, as produced by your mouth and the rest of your vocal tract. Especially in English, spelling often has little do with how a word is pronounced. Consider beet versus beat. Although these words are spelled with two different letter combinations –ee and ea– they are pronounced the same way, and are the same vowel.

In linguistics, vowels are defined as sounds produced with a wide opening in the vocal tract, producing a relatively loud, clear sound. Mainstream US English has around twelve unique vowels (though our writing system has only five symbols to denote them– or six, if you count y). How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

Think of a guitar. When we pluck a string, it vibrates at a particular frequency, which determines the note that it plays. All six strings make different notes when plucked, because each will prefer to vibrate at a different frequency, referred to as the string’s natural frequency. This natural frequency is determined by the strings’ thickness and length—thicker strings have lower notes, and we can play higher notes by placing our fingers on a string to shorten it.

Our voices work the same way as a guitar. When we speak, the vocal cords inside of our voice box (or larynx) vibrate to produce sound. They function as guitar strings do— when we push air out of our lungs and through the vocal cords, the cords vibrate at their natural frequency, as determined by their length and thickness. In addition to their natural frequency (referred to by linguists as F0 or H1, for zeroth formant or first harmonic), our vocal cords also vibrate in more complex ways, which produce additional frequency components called harmonics.

However, the sounds our vocal cords produce don’t immediately escape into the air. They have to travel through our vocal tracts, which absorb some harmonic components of the sound while amplifying others. When the vocal tract amplifies (or increases the energy of) particular frequencies that align with its natural frequency, the vocal tract resonates. The particular harmonics that are amplified are referred to as formants. We refer to the first (lowest frequency) formant as F1, the second as F2, and so on.

Do F1 and F2 help us understand vowels? Let’s graph some data from Peterson and Barney (1952) to find out. In this classic study, Peterson and Barney examine which properties of a vowel’s sound signal allow us to distinguish one vowel from another. However, each person’s mouth and vocal tract are different– so how can we control for these natural variations in order to find the core, consistent elements that distinguish vowels for everyone? To do this, Peterson and Barney collected data from 76 speakers of American English. The data consist of several measurements from each speaker’s pronunciation of each vowel sound in English.

Loading the data

We start by installing the package phonTools, which contains many classic and useful phonetics datasets, dplyr for easier dataframe handling, and ggplot2 for better plotting. We then load these packages with library() to make them available to us. After this, we can load the Peterson and Barney (1952) dataset.

install.packages(c('phonTools', 'ggplot2', ‘dplyr’))

# Load packages

library(phonTools, ggplot2, dplyr)

# Load Peterson and Barney 1952 dataset 


Examining the data

We start by taking a cursory examination of the pb52 dataset using head(). We can see that pb52 contains 9 columns; we will focus on ‘type, ‘vowel’, ‘F1’, and ‘F2’ for this analysis. Our ultimate goal is to plot F1 and F2 as a function of which vowel is being pronounced. As F1 and F2 values differ on average between males, females, and children, we will focus on one group – children – for this analysis. However, before we can do this, we need to examine our data and clean it to make it more readable.

By using unique(), we can also see all of the unique values for a particular column. There are 10 unique vowels in this dataset. Some may look strange due to the encoding used by phonTools, but they correspond to the vowels in heed, hid, head, had, caught, cot, hood, boo, bug, and heard, respectively. (Note that in this dataset, the vowels in cot and caught are different; in many people’s ways of speaking (including mine), these vowels are pronounced the same.) As the original vowel encoding can be confusing, we can use the dplyr functions mutate() and recode() to replace the original names with more reader-friendly ones.



# Renaming vowels to be more human-friendly

pb52 %>% mutate(vowel=recode(vowel, 'i'='heed','I'='hid','E'='head','{'='had','V'='caught','A'='cot','O'='hood','U'='boo','u'='bug','3\''='heard'))

Figure 1. Above: the first six entries of the pb52 dataset, after renaming the entries in the ‘vowel’ column. Below: each unique vowel in the ‘vowel’ column of pb52.

Plotting the data

As we only want to look at children, we first filter our pb52 dataframe to only keep rows where the value in column ‘type’ is ‘c’ (for ‘child’). We assign this new, filtered dataframe to another variable (c_pb52) to avoid confusion with the original dataframe, and keep our original dataframe clean if we need to use it later.

After we filter our dataframe, we can use a scatter plot to graph it. This function takes many arguments:

  • c_pb52, the data used to create the scatterplot.

  • The aes() function controls the ‘aesthetics’ of the graph, which includes which variables to plot along the x (x = f2) and y (y = f1) axes. These variables must be typed to exactly match how the column names are written in the dataframe. 

  • aes() also includes the color argument. Here, color=vowel means that each unique vowel gets its own color (as we did not specify a particular color palette, a default one is used).

  • geom_point() is then used to specify that we want a scatter plot

  • stat_ellipse() draws circles around 95% of the data points for each vowel. 

These functions that add particular elements to our graph are chained together with +

For traditional phonetic reasons, we invert the values of the x- and y-axes using scale_x_reverse() and scale_y_reverse().

# Filter df for children only

c_pb52 <- pb52 %>% filter(type=='c')

# Plot dataframe

ggplot(c_pb52, aes(x=f2, y=f1, color=vowel)) 

    + geom_point()

    + stat_ellipse()

    + scale_x_reverse()

    + scale_y_reverse()

Figure 2. Scatter plot depicting F1 and F2 values for ten English vowels.

Interpreting the graph

Looking at the scatter plot in Figure 2, we notice that each vowel seems to be positioned in a unique space on the graph. The vowel [i] in heed is positioned at the top left, while the vowel [a] in caught is in the bottom right. Try saying all of the vowels shown in the graph. Do you notice your tongue moving in a particular direction? Does your tongue move up or down; forward or backward?

As it turns out, F1 and F2 are related to how high in the mouth your tongue is (F1), as well as how far back it is (F2). This has to do with how moving your tongue in these ways changes the shape of the vocal tract, which in turn changes the vocal tract’s resonance. If you have paid attention to your tongue position as you say the vowels in Figure 2, you may have noticed that the vowels in heed and boo place your tongue in a higher position than caught or cot. You may also have noticed that the vowel in heed is also more front in your mouth than boo or caught.

If we look at the chart, we will notice that vowels that are produced higher in the mouth have lower F1 values, while vowels produced further back have lower F2.


By interpreting the scatter plot in Figure 2, as well as knowledge of our own tongues, we arrive at this key insight: F1 is inversely related to tongue height, while F2 is directly related to tongue backness! This is one of the most important insights in phonetics, or the scientific study of human speech sounds. Think about how you might play around with this graph. How could you change the colors, or style of the scatter plot points? Is there a better way to arrange this data? With R, adventure awaits!


  1. Peterson, G. E., and Barney, H. L. (1952). Control methods used in a study of the vowels. J. Acoust. Soc. Amer., 24, 175.