Working with Patient Data

October 12, 2021

I’ve always been interested in biological information and human health while in more recent years I’ve developed a narrower interest in privacy concerns regarding patient data. When it comes to working with patient health data, I’ve realized a human-centered approach is vital. The question is, which human perspective do we empathize with? There are multiple stakeholders that handle patient data, including the patient, medical professionals, the data managers and systems professionals, the government, and private entities. Each stakeholder has their own set of interests, whether that be transparency for the patient, regulation and policy for the government, standards from data managers, or disease research from private entities. Fulfilling all these groups’ interests is not always possible due to the protected nature of patient data. On the other side, fulfilling the patient interest of data privacy does not always happen due to misuses of data, selling of information, and accidental data leaks. The purpose of this writing is to give a brief introduction to the complexity that comes with working with health data. 

Patient data is information regarding an individual’s identity such as name and age, their medical history, including treatments, current and historical illness, and other information such as their insurance coverage. A measure of how protected patient data should be requires thinking about the harm that could be introduced to the patient if the information fell into the wrong hands. Privacy laws and additional barriers to entry exist to make accessing this information more difficult. However, these regulatory hoops can make disease research difficult. Researchers can get a better idea of averages and insights with access to a greater amount of patient data. Ideally, researchers have a representative set of data points that can scale to the broader population. Additionally, the public generally benefits from researchers having access to medical information. Patients also benefit from being able to find communities of people that have the same afflictions as disease can be an isolating obstacle. Below are a few examples of disease databases that can be used by researchers and patients:

Lung Cancer Repository – A resource for patients, clinicians, and researchers who are interested in using lung cancer information for medical research. The information is de-identified and requires registration for approval to access and provide patient information.

All of Us – A National Institutes of Health initiative to build a large-scale database housing health information for a broad application to disease studies. Multiple barriers to accessing the patient data in this repository exist, including de-identifying data, storing data on protected servers, and Certificates of Confidentiality from the U.S. government. 

iCureCeliac – A patient registry for those who suffer from celiac disease. Multiple barriers to accessing the patient data in this registry exist, including technical precaution, limiting access to patient contact information to authorized personnel only, and separation of health information from patient identifiers. 

Orphanet – A database for patients and researchers to inform themselves on rare diseases, including a disease encyclopedia, list of patient organizations, clinical trials and biobanks, and more.

Since becoming interested in this topic, I’ve wondered to what degree patient data is anonymized. Common terms used when talking about patient data are “de-identified” and “anonymized”. Both terms represent attempts to protect a patient’s identity. De-identification can be a complete process as well as a step toward anonymization. Below are attempts at defining the two terms:

De-identified: “De-identified patient data is health information from a medical record that has been stripped of all “direct identifiers”—that is, all information that can be used to identify the patient from whose medical record the health information was derived.” 

Anonymized: “Anonymization refers to the irreversible removal of the link between the individual and his or her medical record data to the degree that it would be virtually impossible to reestablish the link.” 

With both de-identification and anonymization, additional steps to protect the patient identity are typically taken. The degree of protection depends on the possibility of that data being linked back to the person it came from. Public data theoretically requires the highest degree of anonymization while non-public data theoretically requires less because there are additional steps such as registration for access or inability to download the data.

Given these definitions, I still wonder how anonymized health information can truly be. Genomic information, for example, is the blueprint to how we’re built. Even if a database curator removes any attached identifying information from a genome, the blueprint remains. Connecting this blueprint to the person could in theory still be possible due to the nature of this unique code. Anonymizing health data for improving health research is a sophisticated goal. Making more data available requires anonymization that ameliorates patient worries and requires buy-in from the governmental level. Reaching an ideal and complete anonymization level could remove some of the need for consents, which helps researchers access more information and can lead to greater discovery. I believe a human-centric approach to working with patient data could look like laying out risks of sharing this data and the specific ways anonymization and protective layers address those risks. A network of data scientists, medical researchers, and politicians who want better access to and protection of patient data simultaneously will be needed to considerately utilize unlocked information.


What is patient information, and how is it protected?. University of Illinois at Chicago

UIC Online Health Informatics. <> Published March 19, 2020.

Patient data and confidential patient information. NHS Digital.


The Use and Misuse of Electronic Patient Data. Brent, Nancy J. Journal of Infusion Nursing. Volume 28, Issue 4, p 251-257. <> Published July 2005.

What can I do after an improper disclosure of medical records?. Findlaw. <> Published June 12, 2018

Confidentiality breaches in clinical practice: what happens in hospitals?. Beltran-Aroca CM, Girela-Lopez E, Collazo-Chao E, Montero-Pérez-Barquero M, Muñoz-Villanueva MC. BMC Med Ethics. 2016;17(1):52. <> Published Sep 2, 2016.

To Share or Not to Share: Ethical Acquisition and Use of Medical Data. Hollis KF. AMIA Jt Summits Transl Sci Proc. 2016:2016: 420-427. <>, Published July 20, 2016.

All of Us, National Institutes of Health, <>

Anonymising and sharing individual patient data. Emam et al. BMJ. <>Published March 20, 2015.

Using patient-identifiable data for observational research and audit. Rustam Al-Shahi, Charles Warlow. BMJ. <> Published October 28, 2000.

Using de-identified health information to improve care: What, how and why. Glenn Laffel. Practice Fusion. <> Published April 30, 2010.

Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review. Raphaël Chevrier, Vasiliki Foufi, Christophe Gaudet-Blavignac, Arnaud Robert, Christian Lovis. JMIR Publications. <> Published March 31, 2019.