LibGuides: Data Services: Handle Sensitive Data in an Appropriate Manner

What is Sensitive Data?

Not all data can or need to be open. In some instances data must remain in a secure, protected location and must be coded in a manner that ensures the data remains secure. In these instances team members access to the data must take certain forms and any sharing of the data after the study can only take place if certain procedures are followed.

Sensitive data / propietary data include:

Personal data which contain identifiers such as name, age, gender, physical traits, genetic information
Confidential data such as trade secrets, financial information, intellectual property rights
Biological data such as location data of endangered species

Before starting a DMP, make sure you understand all Institutional / Government / Funder Research Policies or Legal Agreements related to research data and data security; important privacy legislation such as FIPPA; and information pertaining to managing Indigenous data (C.A.R.E. Principles).

Types of Identifying Information

Identifying information is classified as one of two types: direct and indirect.

Direct identifiers

These data point directly to an individual and are typically removed from data sets before sharing with the public. These may include:

name
initials
mailing address
phone number
email address
unique identifying numbers, like Social Security numbers or driver's license numbers
vehicle identifiers
medical device identifiers
web or IP addresses
biometric data
photographs of the person
audio recordings
names of relatives
dates specific to individual, like date of birth, marriage, etc.

Indirect identifiers

These may seem harmless on their own, but can point to an individual when combined with other data. It has been recommended (see BMJ article below) that datasets containing three or more indirect identifiers should be reviewed by an independent researcher or ethics committee to evaluate identification risk. Any indirect information not needed for the analysis should be removed. It may be reasonable to supply some of these types of data in aggregated form (like ranges of annual incomes instead of exact numbers). Indirect identifiers may include:

place of medical treatment or doctor's name
gender
rare disease or treatment
sensitive data like illicit drug use or other "risky behaviors"
place of birth
socioeconomic data, like workplace, occupation, annual income, education, etc
general geographic indicators, like postal code of residence
household and family composition
ethnicity
birth year or age
verbatim responses or transcripts

Handling Sensitive Data

Portage has released a series of documents (October 2020) as part of a toolkit for researchers working with sensitive data in the Canadian research context. These provide Canadian researchers with important information around how to manage sensitive data:

In addition, There is a number international guides that may be useful including:

The Data Curation Network (US-based) has also released a comprehensive Primer on Human Subjects Data Essentials
The Finnish Social Science Data Archive hosts a guide on methods for anonymizing and de-identifying human subjects data, including both qualitative and quantitative approaches
OpenAire Sensitive Data Guide (Europe)
Quick Guide to HIPAA (Stanford University)
Guidance on the HIPAA Privacy Rule, includes a definition of Protected Health Information (US Department of Health and Human Services)
Harvard's DataTags project is working on secure transfer and storage solutions for publishing sensitive data to Dataverse

Modifying Sensitive Data for Public Release

Sensitive data that contain potentially identifying information -- whether it be human subject data or other types of sensitive data -- will likely need to be modified prior to sharing these data with the public. It is important that these modifications are made in order to protect participant confidentiality, the location of endangered wildlife, or for other relevant reasons. However, these modifications may affect the data to the point where reproducibility or additional subsequent research by others is no longer possible. You might consider retaining multiple versions of the data: one that is suitable for public release, and one that is suitable for further research but that is available on a highly restricted basis.

Data Services

What is Sensitive Data?

Types of Identifying Information

Direct identifiers

Indirect identifiers

Further reading:

Handling Sensitive Data

Further Reading:

Modifying Sensitive Data for Public Release

Further Reading