Measuring Vowels Without Relying on Sex-Based Assumptions

April 8, 2025

Measuring Vowels Without Relying on Sex-Based Assumptions

This blog post will walk through a Python-based process for taking your audio data, annotations, and speaker metadata and come away with a tabular dataset containing fine-grained acoustic vowel measures that can be visualized and used for statistical analysis. The main incentive behind this project is to show that it is possible to get accurate vowel measurements – perhaps more so than with typical methods – without asking speakers for their assigned sex at birth or assuming this about them as the researcher. I designed this firstly as a resource for students in Linguistics programs with an interest in phonetics (and at least an introductory data science background), but the core concepts can be adapted to a range of projects involving repeated measures data from multiple sources and/or direct work with audio files.

I will first cover how to use file path information to load in TextGrid annotations and get speaker metadata, and how to access audio files in order to extract a series of acoustic measures. This tutorial will be done using a toy dataset focused on English vowels, but much of the code can be generalized to any speech sounds of interest, especially where the researcher is interested in accurate measurements without presuming speaker sex.

You can follow along in the accompanying tutorial Jupyter Notebook on GitHub. The tutorial assumes coding experience at the level of D-Lab’s Python Fundamentals workshop.

Setup and Installation

For setup and installation instructions, please visit my previous blog post on Python Data Processing Basics for Acoustic Analysis and follow until the section titled “Single Speaker Test.” Note that you will need to download this GitHub repo instead of the one listed there. Then pick back up here.

In addition to the steps outlined in the previous post, you will need to install a few additional libraries:

numpy allows us to work efficiently with data in array format
pyarrow allows us to save large tabular datasets a space-efficient, binary “feather” format (instead of csv)
matplotlib is a basic but powerful plotting library
seaborn allows us access to more tools for creating “beautified” plots

All of these can be installed in Terminal using pip by running each of the following lines:

pip install numpy

pip install pyarrow

pip install matplotlib

pip install seaborn

It’s possible you may need an earlier version of parselmouth for the FormantPath calls below to work without error. If you are seeing unexpected errors in steps 7-10, try recreating your environment using the custom yaml file here (included in the repository materials you downloaded) by running the following in terminal:

conda env create -f formants.yaml

From here on out, we will be working with three test speakers’ data, plus one test utterance, found here

Single Speaker Test

To start, we’ll test out the process of retrieving and merging data for a single utterance, step-by-step, before combining the steps into larger code chunks. We’ll use one utterance rather than a whole recording because, as you will see, we will be generating many data points per vowel token, and we just want to do a quick check that our code is working as intended. Then, we’ll put the process into full practice by looping over all three test speakers.

Step 0: Load in libraries

First, we need to import the relevant tools from all the libraries we just installed. If the installation was successful, the code below should run with no output.

# be sure to follow setup and installation steps above first

importnumpyasnp

npInf = npinf # this gets rid of a pesky error in newer versions of numpy

importpandasaspd

frompathlibimport Path

fromphonlab.utilsimport dir2df

fromaudiolabelimport read_label

importparselmouthasps

fromparselmouth.praatimport call as pcall

fromparselmouthimport Sound

importpyarrow

importseabornassns

importmatplotlib.pyplotasplt

%matplotlib inline

Step 1: Save path to data directory, identify speaker directories

Next, we’ll need to retrieve the path to our toy data and store it as a variable. The data folder is stored within the same parent folder as the tutorial notebook, so ./datawill suffice. Then, we can use this variable as an argument to dir2df, which generates a data frame with one column containing the name of each speaker-specific folder (relpath), and another column with the list of files inside these folders, specifically those ending in .wav(fname).

# get the path to larger folder containing your data

datadir = Path('./data')absolute()

# create df with by-speaker subfolders containing wav and TextGrid data for one speaker

# fnpat specifies unique wav files so that spkrdf contains each speaker name only once

spkrdf = dir2df(datadir, fnpat='\.wav$')

spkrdf

spkrdf should look like the following for our toy data:

A small, 4 line dataframe with columns "relpath" and "fname".

Processing TextGrid Files

Step 2: Extract phones and words tiers from TextGrid

Note that while these steps are quite similar to those outlined in my previous posts, there are some differences in the details, so we will walk through everything again here.

First, because we are only interested in vowels, we need to create a list of target sounds so that later we can use it to filter our TextGrid data and save ourselves time. We can do this by creating a list with all the possible vowel sounds, in this case using ARPAbet (Shoup 1980), a commonly-used phonetic alphabet that uses ASCII symbols, rather than the International Phonetic Alphabet. The numbers at the end indicate whether the vowel had primary stress (1), secondary stress (2), or were unstressed (0), and we also include bare annotations as a catch-all.

vowels = ['IY','IY0', 'IY1', 'IY2', 'IH', 'IH0', 'IH1', 'IH2', 'EY', 'EY0', 'EY1', 'EY2',

'EH', 'EH0', 'EH1', 'EH2', 'AH', 'AH0', 'AH1', 'AE', 'AE0', 'AE1', 'AE2', 'ER',

'ER0', 'ER1', 'ER2', 'UW', 'UW0', 'UW1', 'UW2','UH', 'UH0', 'UH1', 'UH2', 'OW',

'OW0', 'OW1', 'OW2', 'AA', 'AA0', 'AA1', 'AA2', 'AO', 'AO0', 'AO1', 'AO2']

Since spkrdf contains four speakers, but we want to test out our workflow on just one, we can use head() to select only the first row, for S00’s data. We’re setting this process up as a for-loop now so that it’s easier to adapt to multiple speakers later. Before working with the TextGrid, we can

We can then establish a new variable, spkrfile, which uses the info we stored in datadir and spkrdf to get the specific path to S01’s TextGrid file. We can use spkrdr as an argument to read_label (a function from audiolabel) in order to store each tier in the TextGrid as its own data frame – in this case, the tiers labeled ‘phones’ and ‘words,’ which are stored in phdf and wrdf, respectively.

for row in spkrdfhead(1)itertuples():

print(f"Processing speaker: {row.relpath}")

spkrfile = Path(datadir, rowrelpath, rowfname)with_suffix('.TextGrid')

phdf, wrdf = read_label(spkrfile, ftype='praat', tiers=['phones', 'words'])

phdf

phdf should look this like, including some empty phones segments where we didn’t include labels in Praat:

"t1", "t2", "phones", "fname".

Similarly, wrdf looks like the following; note that, for now, phdf and wrdf have different row numbers (as we have multiple phone segments per word):

"t1", "t2", "words", "fname".

Step 3: Subsetting the phones data frame

Since phones are the finer-grained variable, we’ll want to subset phdf for the segments of interest before merging it into a single data frame with wrdf Because we ultimately want to take measurements from only the sounds of interest, we can first eliminate all of the empty segments corresponding to intervals we ignored during annotation. We use copy() to ensure the new subset is treated as a unique object, avoiding pesky error messages.

Since read_label conveniently stored the start (t1) and end (t2) times for each TextGrid segment inside phdf, we can simply subtract all of t1 from all of t2 to get the duration of each segment. Moreover, since these TextGrids were force aligned, and thus contain every phone of every word transcribed, we can use shift() on the phones column to populate two new columns, prev and nxt, with the previous and following sounds at each row.

Now we no longer need our non-target phones values, so we can use isin() to retain only the vowels we included in the vowels list, and from there, only retain the segments that are at least 0.05 sec. This latter step is because Praat (and therefore parselmouth) only samples the speech signal at about every 10 ms, and we want at least a few samples per measurement to ensure they are reliable.

# remove empty segments

phdf = phdf[phdf['phones']!='']copy()

# add phone duration tier

phdf['phone_dur'] = phdf['t2']phdf['t1']

# add col for previous phone

phdf['prev']=phdf['phones']shift()

# add col for following phone

phdf['nxt']=phdf['phones']shift(-1)

# keep only vowels, remove short tokens

phdf = phdf[phdf['phones']isin(vowels) & (phdf['phone_dur'] >=0.05)]copy()

# check updated df - should be no empty phone segments or segments <0.05s

phdf

Now, phdf should look a bit neater and is reduced in row number:

"t1", "t2", "phones", "fname", "phone_dur", "prev", "nxt".

Step 4: Merge phones and words dfs

Now that we’ve cleaned up phdf to include only the data points of interest, we’re ready to merge in wrdf so that each phone segment has its accompanying word label.merge_asof()allows us to specify a matching column between the two data frames on which to merge, in this case, t1. Each phone segment gets matched to the word segment with the nearest start time. We can also specify a list of columns for each data frame, indicating the columns we want to retain from each; this way we don’t end up with duplicate t2 columns, and we can eliminate any columns with extraneous information (like fnameabove).

The suffixes argument is a safeguard, so if something goes wrong and there are duplicate columns, they get a suffix added to their names indicating where they came from.

# merge matching on closest start times between phone and word annotations

tg = pdmerge_asof(

phdf[['t1', 't2', 'phones', 'phone_dur', 'prev', 'nxt']],

wrdf[['t1', 'words']],

on='t1',

suffixes=['_ph', '_wd'] # in case there are duplicates

)

# check merged df is same length and has only specified columns

Our merged data frame is saved as tg, which should look like the following, with the same number of rows as phdf:

"t1", "t2", "phones", "phone_dur", "prev", "nxt", "words".

Setting the FormantPath parameters

Step 5: Create parameter dictionaries

The parselmouth library works by importing “calls” from Praat in Python, so essentially anything we would see in a prompt window from Praat when asking it to do something needs to be addressed in our code. To avoid making our parselmouth calls in the main loop excessively long, we can set the parameters we want in advance by storing them in a dictionary that we reference later. The nice thing about using “To FormantPath…” is that it helps us choose the best speaker-by-speaker, moment-by-moment formant analysis, so we can use the same parameters for all our speakers.

This is one of the main advantages of this approach over more typical ones, where each speaker gets their own parameter values, particularly for the analysis “ceiling,” based on their actual or presumed sex. In brief, ceilings are important here because if a speaker’s vocal tract resonances are relatively low but the ceiling is set too high, for example, the analysis may mistakenly identify higher formants (resonances – read more here) as lower ones, leading to incorrect measurements. Normally, this issue is addressed by leaning into the tendency for men to have longer vocal tracts and thus lower resonances than women, and setting two ceilings accordingly, but this ignores in-group variation and assumes we have the correct information about speaker sex. Oftentimes we don’t ask for this information properly if at all, and our incorrect assumptions can be both harmful and lead to inaccurate data. It’s time we address this!

Here, in our fpathparams dictionary, we’re giving Praat a starting point with mid_formant_ceiling that is intermediate between the typical ceilings for male and female speakers (5250 Hz) and telling it to choose the best ceiling, for each analysis window, within a range including 5 values above and below the mid value (at steps 0.05 * 5250 = 262.5 Hz in size). Ultimately Praat will fit a polynomial “path” over the sound file based on the settings chosen at each interval. We’ll extract 5 formants total, per max_num_formants. Read more about FormantPath objects here

# parameters for the FormantPath analysis

fpathparams = {

'time_step(s)': 0.005,

'max_num_formants': 5.0,

'mid_formant_ceiling': 5250,

'window_len': 0.02,

'pre_emph_from(Hz)': 50,

'LPC_model': 'Robust',

'ceiling_step_size': 0.05,

'num_steps_up_down': 5,

'tolerance_1': 1e-6,

'tolerance_2': 1e-6,

'num_std_dev': 1.5,

'max_num_iterations': 5,

'tolerance': 0.000001,

'get_source_as_multichan_sound': 'no'

}

In addition, to convert the FormantPath object into tabular format, we’ll set the parameters for Praat’s “Down to table (optimal interval)...” call in the downtotableparams dictionary. The coeff_by_track parameter tells Praat the shape of the function we are fitting with our path. Other parameters like inc_num_formants and inc_bw tell Praat what columns to include in the table, in this case the number of formants successfully extracted and the formant bandwidth. We also have a final additional dictionary, downtotabledtype, that specifies the data type of each column in our output table, and we store the name of our desired columns as a list to use later. The resulting Table object can then be written to CSV or another delimited text format of your choice.

# parameters for the Table object

downtotableparams = {

'coeff_by_track': '3 3 3 3 3',

'power': 1.25,

'inc_frame_num': 'no',

'inc_time': 'yes',

'num_time_decimal': 6,

'inc_intensity': 'yes',

'num_intensity_decimal': 3,

'inc_num_formants': 'yes',

'num_freq_decimal': 3,

'inc_bw': 'yes',

'inc_optimal_ceil': 'yes',

'inc_min_stress': 'yes'

}

# dtypes for the Table

downtotabledtype = {

'time(s)': npfloat32,

'intensity': npfloat32,

'nformants': npint16,

'F1(Hz)': npfloat32,

'B1(Hz)': npfloat32,

'F2(Hz)': npfloat32,

'B2(Hz)': npfloat32,

'F3(Hz)': npfloat32,

'B3(Hz)': npfloat32,

'F4(Hz)': npfloat32,

'B4(Hz)': npfloat32,

'F5(Hz)': npfloat32,

'B5(Hz)': npfloat32,

'Ceiling(Hz)': npfloat32,

'Stress': npfloat32

}

# list of column names

downtotablecols =list(downtotabledtypekeys())

Working with the Audio File

Next, we have to prep the audio file for our single test utterance to extract our desired acoustic measures and add them to our ever-growing data frame.

Step 6: Use WAV path to create sound object

Since we already stored the directory information about our WAV files inside datadir, we can reconstruct the path using the relpath and fname arguments once again and save it as wav. For now, we just want the first audio file, which we can select with .iloc[0].

row = spkrdfiloc[0]

row

Output from above code.

Then, if we provide the path to our WAV file as a string to the parselmouth method Sound(), it will save the WAV file as a ‘sound object’ in our notebook. However, since our audio comes from interview recordings, we also have to extract the channel with the participant’s audio to make sure it’s processed without noise from the other channel, which we do by selecting the second channel here. This allows us to start manipulating the file as we would in Praat.

# get path to wav files

wav = datadir / rowrelpath / rowfname

# use path name to create sound object

snd_stereo = Sound(str(wav))

# extract channel for participant audio

snd = snd_stereoextract_channel(2)

Step 7: Create the FormantPath

Now we can generate our FormantPath. We can refer back to the values of our fpathparams dictionary using the “splat” operator (the initial asterisk). It’s also good practice to save the resulting object in case you want to check the current speaker’s output later for debugging purposes, which can be done by saving it to a binary file with pcall

fp = pcall(snd, 'To FormantPath...', *fpathparamsvalues())

pcall(fp, "Save as binary file...", "dollar_store.FormantPath") # save for later

# ds_fp = pcall("Read from file...", "dollar_store.FormantPath") # re-import later using this line

Step 8: Down to Table, convert to DataFrame

Next, we convert our path to a Table object, referring back to downtotableparams dictionary. The second ‘Down to Matrix’ step converts the Table into an indexable format, and the third step converts this matrix into a pandas DataFrame fmtdf, which we can work with directly in Python.

for phone_row in tgitertuples(index=False):

opttable = pcall(fp, 'Down to Table (optimal interval)...', phone_rowt1,

phone_rowt2, *downtotableparamsvalues())

optmatrix = pcall(opttable, 'Down to Matrix')

# Create DataFrame from the extracted formant matrix

# check this by hand 3/31

fmtdf = pdDataFrame({

c: pdSeries(optmatrixvalues[:, i], dtype=downtotabledtype[c])

for i, c inenumerate(downtotablecols)

})

fmtdfhead()

The first few columns of the head of fmtdf should look like the following, with the column names we specified in downtotablecols:

"time(s)", "intensity", "nformants", "F1(Hz)", "B1(Hz)", "F2(Hz)", "B2(Hz)", "F3(Hz)", "B3(Hz)".

Step 9: Add metadata

Finally, we want to add all the additional metadata we want each row to be tagged with. In this case, we’ll include vowel start (t1) and end times (t2), speaker ID (speaker), recording name (recording), vowel labels (phones), vowel duration (phone_dur), previous (prev) and following vowels (nxt), and the word the vowel came from (word).

fmtdf['t1'] = phone_rowt1

fmtdf['t2'] = phone_rowt2

fmtdf['speaker'] = rowrelpath

fmtdf['recording'] = rowfname

fmtdf['phones'] = phone_rowphones

fmtdf['phone_dur'] = phone_rowt2 phone_rowt1

fmtdf['prev'] = phone_rowprev

fmtdf['nxt'] = phone_rownxt

fmtdf['words'] = phone_rowwords

fmtdf.head()

Our final dataset for the test utterance now includes these columns:

"F4(Hz)", "Stress", "t1", "t2", "speaker", "recording", "phones", "phone_dur", "prev", "nxt", "words".

Looping Through Speakers

In most use cases, whether we are interested in phonetic variation, social variation, or both, we will want to compare our measurements across multiple speakers. So long as your by-speaker is stored as described above, we only need to add a few lines to our to make this happen. Namely:

Initialize a fmtdf_list to temporarily store our data for each speaker.
Remove the head() method from our call to spkrdf to loop through the entirety of spkrdf
Add print() statements to let us know what speaker we're on and what step we're on for that speaker.
Use concat() as a last step, to append each df in fmtdf_list into a single final_fmtdf

This code could take some time to execute.

fmtdf_list = []

for row in spkrdfitertuples(index=False):

print(f"Processing speaker: {row.relpath}")

spkrfile = Path(datadir, rowrelpath, rowfname)

phdf, wrdf = read_label(spkrfilewith_suffix('.TextGrid'), ftype='praat',

tiers=['phones', 'words'])

phdf = phdf[phdf['phones'] !='']copy()

phdf['phone_dur'] = phdf['t2'] phdf['t1']

phdf['prev'] = phdf['phones']shift()

phdf['nxt'] = phdf['phones']shift(-1)

phdf = phdf[phdf['phones']isin(vowels) & (phdf['phone_dur'] >=0.05)]copy()

tg = pdmerge_asof(

phdf[['t1', 't2', 'phones', 'phone_dur', 'prev', 'nxt']],

wrdf[['t1', 'words']],

on='t1',

suffixes=['_ph', '_wd']

)

wav = datadir / rowrelpath / rowfname

snd_stereo = Sound(str(wav))

snd = snd_stereoextract_channel(2)

fp = pcall(snd, 'To FormantPath...', *fpathparamsvalues())

pcall(fp, "Save as binary file...", "dollar_store.FormantPath")

for phone_row in tgitertuples(index=False):

opttable = pcall(fp, 'Down to Table (optimal interval)...', phone_rowt1,

phone_rowt2, *downtotableparamsvalues())

optmatrix = pcall(opttable, 'Down to Matrix')

fmtdf = pdDataFrame({

c: pdSeries(optmatrixvalues[:, i], dtype=downtotabledtype[c])

for i, c inenumerate(downtotablecols)

})

fmtdf['t1'] = phone_rowt1

fmtdf['t2'] = phone_rowt2

fmtdf['speaker'] = rowrelpath

fmtdf['recording'] = rowfname

fmtdf['phones'] = phone_rowphones

fmtdf['phone_dur'] = phone_rowt2 phone_rowt1

fmtdf['prev'] = phone_rowprev

fmtdf['nxt'] = phone_rownxt

fmtdf['words'] = phone_rowwords

# Append each speaker’s data to the accumulator list

fmtdf_listappend(fmtdf)

# Combine results from all speakers into final DataFrame

final_fmtdf = pdconcat(fmtdf_list, ignore_index=True)

print("Final formant data compiled.")

final_fmtdf

Step 10: Save your completed dataset

# Save w/ original filename plus tag for metadata and date for reference

final_fmtdfto_feather('./formant-paths_04-01-25.ft')

Visualize the Vowel Space, Check for Outliers

Once we have our data for a few speakers, it's good practice to visualize your vowel formant data to check that it has the expected shape overall. We can do this using matplotlib and seaborn

Step 11: Subset for stressed vowels

To get a general sense of the vowel space in our visuals, we’ll want to filter for unstressed vowels; that way, our data are tidier and aren’t impacted by vowel reduction. ARPAbet marks stress with a final number, as you might recall, so we can use a boolean mask to filter for the vowels marked with ‘1,’ and then save all the vowel labels without numbers in a new column phones_short

final_fmtdf = final_fmtdf[final_fmtdf['phones']astype(str)str[-1] =='1']

final_fmtdf['phones_short'] = final_fmtdf['phones']str[:-1]

final_fmtdf['phones_short']unique()

"array (['EY', 'IH', 'UH', 'AH', 'UW' , 'AA', 'AE', 'OW', 'ER', 'EH', 'IY', 'AO'], dtype=object)"

Step 12: Group using by-token medians

Because “To Formant Path…” gives us far more measurements per token than we need (as you can see above, 20 minutes of speech from three speakers each yielded 138,201 measurements), we’ll group together measurements taken between the same t1 and t2 for the same recording, by their median. This avoids any large outliers from biasing the grouping as might happen with the mean. We can do this by chaining the DataFrame methods groupby() and .agg(). Let’s also get rid of the S00 test data.

final_fmtdf = final_fmtdf[final_fmtdf['speaker'] !='S00']

medians = final_fmtdfgroupby(['speaker', 'recording', 't1',

'phones_short'])agg({'F1(Hz)': 'median',

'F2(Hz)': 'median'})reset_index()

medians.head()

"speaker", "recording", "t1", "phones_short", "F1(Hz)", "F2(Hz)".

Step 13: Filter for outliers

There are many ways we can filter for outliers, but one common method is to use descriptive statistics, i.e., mean and standard deviation. For each speaker, let’s filter our vowels to retain only the medians that are within 1.75 standard deviations of the mean for each vowel. We’ll first have to get the mean and standard deviation for each relevant grouping, then merge them into our medians_df before filtering.

stats = mediansgroupby(['speaker',

'phones_short'])[['F1(Hz)',

'F2(Hz)']]agg(['mean', 'std'])

# flatten the MultiIndex columns

statscolumns = ['_'join(col)strip() for col in statscolumnsvalues]

# merge the stats with the medians df

medians = mediansmerge(stats, on=['speaker', 'phones_short'], how='left')

threshold =1.75

filtered_meds = medians[

(medians['F1(Hz)'] >= medians['F1(Hz)_mean'] threshold * medians['F1(Hz)_std']) &

(medians['F1(Hz)'] <= medians['F1(Hz)_mean'] + threshold * medians['F1(Hz)_std']) &

(medians['F2(Hz)'] >= medians['F2(Hz)_mean'] threshold * medians['F2(Hz)_std']) &

(medians['F2(Hz)'] <= medians['F2(Hz)_mean'] + threshold * medians['F2(Hz)_std'])

]

Step 14: Plot filtered medians for each speaker

Now we can save a list of all our speakers and loop over them to plot their filtered by-token medians. Note that we are setting axis limits for all three, so adjustments may be needed to accommodate additional data, and that we must invert the axes to get the canonical plot with F1 increasing downward and F2 increasing leftward. In the final step, we use groupby once more to get the mean of the medians in order to plot a label within each vowel’s distribution.

unique_speakers = filtered_meds['speaker']unique()

for speaker in unique_speakers:

speaker_data = filtered_meds[filtered_meds['speaker'] == speaker]

# Scatter plot

pltfigure(figsize=(8, 6))

scatter_plot = snsscatterplot(

x='F2(Hz)',

y='F1(Hz)',

hue='phones_short',

style='speaker',

data=speaker_data,

s=50,

alpha=0.7,

palette='muted'

)

pltxlabel('F2 (Hz)')

pltylabel('F1 (Hz)')

pltxlim(100, 3000)

pltylim(200, 1200)

# invert axes for vowel plotting

pltgca()invert_yaxis()

pltgca()invert_xaxis()

# Label means of the medians

means = speaker_datagroupby('phones_short')[['F1(Hz)',

'F2(Hz)']]mean()reset_index()

for _, row in meansiterrows():

plttext(

row['F2(Hz)'], row['F1(Hz)'], row['phones_short'],

fontsize=12, ha='right', va='bottom', fontweight='bold'

)

pltlegend()

plttitle(f'Vowel space for {speaker}')

pltshow()

Graph titled "Vowel space for S01", y-axis labeled "F1 (Hz)", x-axis labeled "F2 (Hz)".

Graph titled "Vowel space for S02", y-axis labeled "F1 (Hz)", x-axis labeled "F2 (Hz)".

Graph titled "Vowel space for S03", y-axis labeled "F1 (Hz)", x-axis labeled "F2 (Hz)".

Congratulations! Now you can make custom visuals of speakers' vowel spaces in a corpus, based on fine-grained formant measurements taken without relying on sex-based assumptions.

As you can see, there may be more outliers to look into for these speakers, so this tutorial should serve as a starting point that can be used to assess other steps in the pipeline, such as the forced alignment that produced the annotations, before moving on to statistical analysis.

References

Boersma, P., & Weenink, D. (2021). Praat: doing Phonetics by Computer [Computer program]. Version 6.1.38, retrieved January 2, 2021, fromhttp://www.praat.org/
Jadoul, Y., Thompson, B., & de Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1–15.https://doi.org/10.1016/j.wocn.2018.07.001
McKinney, W. (2010). Data Structures for Statistical Computing in Python. In S. van der Walt & J. Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp. 56–61).https://doi.org/10.25080/Majora-92bf1922-00a
Python Software Foundation. (2024). pathlib — Object-oriented filesystem paths. In Python documentation (3.x version). Retrieved fromhttps://docs.python.org/3/library/pathlib.html
Shoup, J. E. (1980). Phonological aspects of speech recognition. In W. A. Lea (Ed.), Trends in speech recognition (pp. 125–138). Prentice Hall.
Sprouse, R. (2024a). audiolabel: Python library for reading and writing label files for phonetic analysis (Praat, ESPS, Wavesurfer). GitHub.https://github.com/rsprouse/audiolabel
Sprouse, R. (2024b). phonlab: UC Berkeley Phonlab utilities. GitHub.https://github.com/rsprouse/phonlab
The pandas development team. (2020, February). pandas-dev/pandas: Pandas (latest version). Zenodo.https://doi.org/10.5281/zenodo.3509134
Van Nuenen, T., Sachdeva, P., & Culich, A. (2024). D-Lab Python Fundamentals Workshop: D-Lab's 6-part, 12-hour introduction to Python. GitHub.https://github.com/dlab-berkeley/Python-Fundamentals

Measuring Vowels Without Relying on Sex-Based Assumptions

Topics