Disaggregating Race and Ethnicity Categories in Census Data

November 1, 2022

The collection of race and ethnicity data by the United States Census Bureau has a long, complex, and problematic history. The Census claims that their racial categories generally reflect a social definition of race recognized in America, adhering to guidelines set by the U.S. Office of Management and Budget. In 1900, the Census recognized five racial categories: White, Black, Chinese, Japanese, and American Indian. Today, the Census collects more detailed information about a person’s race, ethnicity, and ancestry, but still publishes a race category that is not entirely aligned with reality. For example, the Census continues to separate race and Hispanic origin, claiming that race and ethnicity are distinct concepts despite including ethnic categories such as “Chinese” and “Japanese” in their survey products. 

While there are ongoing efforts to improve this data, the aggregation of racial categories in their survey products are not conducive to conducting rigorous and detailed analyses of certain vulnerable populations. In this piece, I will discuss reasons why disaggregating broad racial categories in survey data can be useful for policy researchers seeking to better understand disparities across race and ethnicity. I will be enumerating the steps necessary to disaggregate the Asian American and Native Hawaiian/Pacific Islander (AANHPI) group using the American Community Survey (ACS). 

The Case for Southeast Asians as a Distinct Category

Policy researchers often combine diverse ethnic groups into singular categories, which can obscure the experiences of marginalized communities and subsequently impact how government funds are allocated or how policy is shaped. Disaggregated data allows for researchers to develop more complex narratives about diverse communities within groups that are otherwise considered homogenous. For example, analyses of aggregated racial data often indicate that AANHPI are among the most educated, wealthiest, and healthiest populations in the United States

Southeast Asians in particular are unique within the broader AANHPI group in that they primarily came to the United States as refugees. Many of the educational obstacles that the current generation of Southeast Asians and Pacific Islander students face can be traced back to the turbulent manner in which their parents and grandparents arrived in the United States. In comparison to those from other AANHPI groups during the immigration wave in the late 20th century, first generation immigrants from countries in Southeast Asia generally had less English proficiency and fewer transferable skills in the American job market. These gaps across can have generational impacts and result in substantially different socioeconomic conditions within the broader AANHPI category.

Leveraging Census Data

We can use the race, Hispanic origin, and ancestry variables in the ACS to create a recoded race variable to better capture the distinct experiences of Asian subgroups. In the ACS, the primary race variable includes the following categories: White, Black or African American, American Indian or Alaska Native, Chinese, Japanese, Other Asian or Pacific Islander, Other Race, Two Major Races, and Three or More Major Races. Hispanic origin indicates whether an individual self-identifies as “HIspanic” or “Latino.” Ancestry refers to a person’s ethnic origin, heritage, or the country/region of birth of their parents before arriving to the United States. The STATA code below incorporates the three aforementioned variables to create a recoded race variable. This code can be used when analysis ACS 1-yr samples downloaded from IPUMS. The variables race, hispan, and ancestr1 are available through IPUMS and refer to a respondent’s self-reported race, hispanic origin, and ancestry.

The resulting recoded race variable includes the following categories: White, Black, Asian Native Hawaiian Pacific Islander (excluding Southeast Asian), Southeast Asian, American Indian/Alaska Native, Latinx, and Other/One or more races. The Southeast Asian category captures Hmong, Cambodian, Vietnamese, and Burmese survey respondents. This recoded race variable can be used to show detailed information related to poverty, income inequality, and health and educational disparities within vulnerable communities. By applying the code above to the 2019 ACS, we can calculate disaggregated poverty rates showing that a that a higher percentage of Southeast Asians (12.4%) are under the Federal Poverty Line compared to other Asians (8.2%). 

Reasons for Caution

Disaggregating racial data can be an important step towards addressing inequities within communities. However, the concept of race has historically been used to perpetuate violence, divide communities, and uphold a White supremacist status quo–it is a challenging issue to address. Critiques of the practice of data disaggregation are grounded in privacy concerns, possible misinterpretation of results, and that current survey products fail to consider language, culture, and other aspects of one’s ethnoracial identity. When working with racial categories within survey data, researchers must be careful to minimize the harm caused by this work.